Distributed Crawling Engineer Priceonomics
We look through tons of data to build price reports and want you to help us look through even more.
We index 2 million pages per day and use the results to help inform smart purchase decisions. We're looking for someone who wants to own this process, scale it up, and make it harder better faster stronger.
Manage our existing distributed web crawling infrastructure
Monitor performance using statsd and visualize using graphite
Discover and eliminate bottlenecks to make the system go as fast as possible
Build web crawlers to discover and index websites
Deploy crawlers across many (20+) servers in an automated fashion
Python: Tornado and Celery for our backend infrastructure
lxml for dom parsing. We used BeautifulSoup for a while but it became a bottleneck :(.
Amazon Web Services (EC2, S3)
Chef or Puppet, Fabric.
JS / Coffeescript
Statsd & Graphite experiences is a plus.
Email to firstname.lastname@example.org. In your message:
Tell us about the most impressive python project you've built. Links to code would be great but we understand if you can't share.
Include a link to your github profile. If you don't have any open projects listed, include a python project you can share, along with an explanation of what it does and why that's significant.
||San Francisco, CA |