Linux Systems Administrator - DevOps Enginneer Xobni
THIS JOB HAS EXPIRED
About the Job:
Xobni runs 100 in the Cloud and Managed Hosting . Technically we run in "the clouds" and on "the ground"- using dedicated servers and cloud servers at multiple providers based on the price, availability, failover & bandwidth of different offerings. We already bring machines up and down and auto-configure them with the push of a few scripts using a first-generation tool we built around Puppet.
We are looking for someone to take our mission-critical XobniCloud infrastructure to the next level. This system will manage our Implicit Graph infrastructure cluster it will orchestrate the provisioning, load balancing, dynamic configuration/re-configuration, monitoring and spend optimization of 1000+ servers across providers, data centers, availability zones and myriad other variables we haven't even thought of yet. We are dreaming of a server management system that can not only provision servers, but that can also configure the monitoring and the business metrics automatically, the whole shebang You've been itching for the opportunity to do a "server management infrastructure right" for a while and are fired up to absolutely go to town on this - building scaling and healing automation that factors in security, failover, and quality/analytic tools to track stuff like packet loss, performance, latency, and more. You know, the stuff you'd build if world class infrastructure was the priority - and your boss wasnt breathing down your neck about that SOX audit report and the other whizbang SAS70 things the marketing and sales VPs needed yesterday.
We are also looking for someone that can gets his hands dirty and do manual operations when required to do so, write a script on the fly to troubleshoot a problem, be able to restart servers and services, rebalance the load and move large amount of data around the world to keep our customers service alive.
Responsibilities:
-Take personal responsibility for the availability and reliability of our service.
-Save the company a lot of money on infrastructure costs
-Author tools that reliably manage infrastructure. We're looking for someone to write clean, re-usable code. (Using bash, Puppet, MCollective, Ruby, Perl or Python)
-Produce Elegant code thats simple. Write maintainable code with extensive test coverage, working in a professional software engineering environment (with source control, dev/stage/prod release cycle, continuous deployment).
-Well need you to do most of your work in Bash, Puppet, MCollective and Ruby but youre free to select other platforms, languages (Lua, GO, Scala) and open source components for different pieces of the project.
-Support our existing production cluster management system while you improve it. Our current system is hacked together in Puppet/bash/Perl and leverages GIT, Puppet, Apache Cassandra, Perl and a bunch other stuff.
-Own our server image configurations, collaborating with core server engineers to optimize for task performance, reliability, failover and scale.
Requirements:
-You know exactly how awesome it would be to create what's described above. You've been dreaming about the opportunity to work on something like this without the distraction of other stuff for years. This job description made you drool a little. A distributed systems foundation and a service-oriented mindset. Youre always thinking What happens if this fails when you build things. You've "carried the pager" before (ideally at both a startup and a large infrastructure provider) & have first-hand experience with what happens when infrastructure // tools fail.
-A minimum of 3 years of coding experience (school counts) and a good 5 years experience with Operations and Infrastructure in general. Much more experience would be great. What matters is that youve shipped and maintained mission-critical tools and infrastructure that many other people depend on.
-You are a prolific engineer who works well independently.
Bonus Points:
-Youve written software tools to manage 1000+ servers.
-You are conversant in the pros and cons of different clouds: EC2, slicehost, rackspace, etc.
-You've poked around with other projects trying to do similar things (rightscale, cloudcake, VMware .).
-You read up on and experiment with new technologies because its in your nature, not because its a job requirement.
-You dont just learn how things work, you learn why.
Formal training in computer science (bachelors, masters, whatever)
| Location: |
2140 Taylor Street
Apartment 805
San Francisco, CA 94133
United States
|