Linux Systems Administrator - DevOps Engineer -West Coast Xobni
About the Job:
Xobni runs 100% in the Cloud and Managed Hosting . Technically we run in "the clouds" and on "the ground"- using dedicated servers and cloud servers at multiple providers based on the price, availability, failover & bandwidth of different offerings. We already bring machines up and down and auto-configure them with the push of a few scripts using a first-generation tool we built around Puppet.
We are looking for someone to take our mission-critical ?XobniCloud? infrastructure to the next level. This system will manage our ?Implicit Graph? infrastructure cluster ? it will orchestrate the provisioning, load balancing, dynamic configuration/re-configuration, monitoring and spend optimization of 1000+ servers across providers, data centers, availability zones and myriad other variables we haven't even thought of yet. We are dreaming of a server management system that can not only provision servers, but that can also configure the monitoring and the business metrics automatically, the whole shebang! You've been itching for the opportunity to do a "server management infrastructure right" for a while and are fired up to absolutely go to town on this - building scaling and healing automation that factors in security, failover, and quality/analytic tools to track stuff like packet loss, performance, latency, and more. You know, the stuff you'd build if world class infrastructure was the priority - and your boss wasn?t breathing down your neck about that SOX audit report and the other whizbang SAS70 things the marketing and sales VPs needed yesterday.
We are also looking for someone that can gets his hands dirty and do manual operations when required to do so, write a script on the fly to troubleshoot a problem, be able to restart servers and services, rebalance the load and move large amount of data around the world to keep our customers? service alive.
Responsibilities:
-Take personal responsibility for the availability and reliability of our service.
-Save the company a lot of money on infrastructure costs
-Author tools that reliably manage infrastructure. We're looking for someone to write clean, re-usable code. (Using bash, Puppet, MCollective, Ruby, Perl or Python)
-Produce Elegant code that?s simple. Write maintainable code with extensive test coverage, working in a professional software engineering environment (with source control, dev/stage/prod release cycle, continuous deployment).
-We?ll need you to do most of your work in Bash, Puppet, MCollective and Ruby but you?re free to select other platforms, languages (Lua, GO, Scala?) and open source components for different pieces of the project.
-Support our existing production cluster management system while you improve it. Our current system is hacked together in Puppet/bash/Perl and leverages GIT, Puppet, Apache Cassandra, Perl and a bunch other stuff.
-Own our server image configurations, collaborating with core server engineers to optimize for task performance, reliability, failover and scale.
Requirements:
-You know exactly how awesome it would be to create what's described above. You've been dreaming about the opportunity to work on something like this without the distraction of other stuff for years. This job description made you drool a little. A distributed systems foundation and a service-oriented mindset. You?re always thinking ?What happens if this fails?? when you build things. You've "carried the pager" before (ideally at both a startup and a large infrastructure provider) & have first-hand experience with what happens when infrastructure // tools fail.
-A minimum of 3 years of coding experience (school counts) and a good 5 years experience with Operations and Infrastructure in general. Much more experience would be great. What matters is that you?ve shipped and maintained mission-critical tools and infrastructure that many other people depend on.
-You are a prolific engineer who works well independently.
Bonus Points:
-You?ve written software tools to manage 1000+ servers.
-You are conversant in the pros and cons of different clouds: EC2, slicehost, rackspace, etc.
-You've poked around with other projects trying to do similar things (rightscale, cloudcake, VMware ?.).
-You read up on and experiment with new technologies because it?s in your nature, not because it?s a job requirement.
-You don?t just learn how things work, you learn why.
Formal training in computer science (bachelors, masters, whatever)
| Location: |
2140 Taylor Street
Apartment 805
San Francisco, CA 94133
United States
|