As part of the reliability engineering team at HubSpot, NOC Engineers take responsibility for the operation of the service outside of business hours (standard shift is 11PM-7AM, Sunday through Thursday). This role is one that requires exceptional judgment, flexibility and creativity, and a willingness to learn new skills across a broad range of technologies and architectures. The right candidate will find themselves gaining the skills and experience needed to move onto HubSpot's Software Reliability Engineering or Technical Operations teams, roles that are fundamental to growth of the company and the stability of our products.
- NOC Engineers will triage any alerts generated by our monitoring systems or issues raised by our Support team
- NOC Engineers will resolve issues when they're able; when unable to do so, they'll work with senior product or infrastructure engineers to evaluate the problem and assist with resolution
- NOC Engineers will take ownership of response documentation, driving the creation of runbooks and checklists that help to make problem response simple and repeatable
- NOC Engineers will take ownership of the cleanliness of our monitoring systems, ensuring that the alerts generated are both real and actionable, and taking action to remove or improve those that are not
- NOC Engineers will work with product teams to ensure that all metrics and functionality that is necessary for the operation of the service are being collected and monitored, with an eye toward ensuring that when things go wrong we're aware of it.
This position is one that is appropriate for someone who is new to software or to operations, and who wants a quick and comprehensive education in the world of operations, reliability, and software architecture. Someone entering the field should be able to demonstrate familiarity with Linux (eg, knowledge of the command line), as well as a history of problem solving across technical or non-technical fields. An ideal candidate will also have some knowledge of code (ideally Java, Python, or both), though formal knowledge is less important than applicable knowledge. It is our hope and expectation that someone starting in this position would gain the skills needed to join the SRE or TechOps teams within a 6-12 month period.
|Location:||Cambridge, MA |