Site Reliability Engineer
1 week ago
Job type: Full-time
Hiring from: North America
Category: Software Dev
What You Will Do
You will be a member of a six-person Site Reliability Engineering (SRE) team responsible for maintaining and evolving the operational infrastructure for the Let’s Encrypt certificate authority. You will work closely with our application software developers and management to plan and implement the future of the certificate authority, its software applications, and its policies and procedures.
We provide secure and reliable service to more than 150 million websites around the world. We expect this number to grow rapidly. As such it’s a unique opportunity to have an enormous impact on creating a more secure and privacy-respecting Web.
In some organizations, the people responsible for deploying applications are left out of the full application development lifecycle. They are simply handed something at the end and told “make this run reliably, securely, and efficiently” while the infrastructure management role is devalued or taken for granted. That is not how we do things at Let’s Encrypt. SRE is part of the application development lifecycle from start to finish and we heavily invest in enabling and building infrastructure that is reliable, secure, and efficient. SRE is given latitude, time, and resources to do things The Right Way.
Automation is central to everything you and your team will build and maintain. You will automate operations extensively for the sake of security, scalability, correctness, compliance, and financial efficiency. You will make sure that when something does need to be done manually, it can be done in a safe and efficient manner. Our focus on automation means we are particularly interested in candidates with software engineering skills.
Our physical infrastructure includes servers, storage, switches, firewalls, and HSMs deployed across two highly secure data centers. While the majority of our infrastructure runs on our own hardware, we do use external cloud and CDN providers for some peripheral systems.
We use open source software (e.g. Linux, Prometheus, Grafana, SaltStack) extensively and prefer it when it can get the job done. The core CA application software that your team will be responsible for deploying is open source and written by our software development team.
Effective engineers know how to properly prioritize and communicate well. We will be looking for those skills in candidates.
- Two years professional experience as a software developer
- An understanding of why writing tests for software is critical
- A willingness to travel approximately three times per year
- A willingness to be on-call (time split between six people)
- Personal organization ability so that people can depend on you (e.g. task lists, calendar management)
Skills You Will Need to Develop
We write most of our code in Go and Python. You don’t need to know these languages coming in but you will need to learn them.
You will need to develop systems and network administration skills if you haven’t already. This means, for example, learning to manage firewalls and routers, work with automation tools like SaltStack, and manage virtual machines on both physical and cloud infrastructure.
You will need to gain domain-specific knowledge (e.g. PKI) but you don’t need to know it coming in.
Location and Benefits
This is a remote position available anywhere in the United States or Canada.
Benefits include excellent health insurance, a 100% match for 401k contributions, and flexible time off and parental leave policies.
Please mention that you come from Remotive when applying for this job.
Help us maintain Remotive! If this link is broken, please just click to report dead link!