We are looking for an experienced ruby developer to help us build a suite of web scrapers.
We have a Ruby on Rails application that scrapes news items from various media release index/show pages and turns them into RSS feeds.
We've created and refined a process for adding new scrapers to the app using mechanize, vcr and rspec.
Now we're looking for someone to help us work through our backlog of sites to scrape.
We'll use Github to manage the work, perform code reviews and deploy new scrapers to production.
For each site, you'll be assigned PRs that includes:
- details of the URL to scrape
- a stubbed scraper class to complete
- related specs
Once you've completed the scraper and got the specs passing, we'll review the code.
Most of the sites to scrape follow a similar pattern and should be quite straightforward.
We're likely to have a steady stream of new sites to scrape. This could be a longer term ongoing project.
For the right candidate there is scope to take on more work, including maintenance of existing scrapers and creating more complex scrapers from scratch.
Required skills:
- Ruby on Rails, including rspec.
- Web Scraping, including CSS style selectors. Things like `'.pager__items a[rel="next"]'` should make sense!