Site Reliability Engineer Yelp
Our Site Reliability Engineers are the primary interface between our developers and our production operations. No matter how many times we get searched, scraped, scanned, spammed, pinged, paged or queried, they gotta keep their cool - and keep the site running smoothly. You'll work in both the dev and systems worlds, instrumenting key parts of core architecture and supporting devs as they try to do the same. We're looking for a true hacker - you'll work as much in bash as Python, and you'll drop into some C now and then. You'll implement monitoring and alerting systems to support site stability and performance. You'll proactively scale our infrastructure to meet ever-increasing demand. You'll make sure that when something goes bump in the night, someone hears it. And you'll play a key role in keeping Yelp fast, available and growing.
Work closely with developers in supporting new features and services
Monitor site stability and performance
Scale infrastructure to meet demand
Troubleshoot site issues
Develop custom tools as necessary
Document system design and procedures
Participate in light on-call rotation
Mastery of Linux or Unix
Command of your favorite modern programming language: Python, Ruby, Java, C++, etc.
Solid understanding of fundamental technologies like TCP/IP, HTTP,
Knowledge of best practices related to security, performance, and disaster recovery
Strong scripting skills in the presence of flying darts
Experience with web server configuration, monitoring, trending, network design, high availability
Excellent communication skills
A sense of humor!
MySQL experience (high availability, scale-out replication)
Advanced knowledge of network design, management of Cisco network equipment, or BGP
Experience at a large-scale consumer internet site
CentOS and Ubuntu distribution familiarity
||650 Mission Street |
San Francisco, CA 94105