Site Reliability Engineer

ISRG


2 months ago

04/11/2019 14:25:18

Job type: Full-time

Hiring from: North America

Category: Software Dev

SRE

What You Will Do

You will be a member of a six-person Site Reliability Engineering (SRE) team responsible for maintaining and evolving the operational infrastructure for the Let’s Encrypt certificate authority. You will work closely with our application software developers and management to plan and implement the future of the certificate authority, its software applications, and its policies and procedures.

We provide secure and reliable service to more than 150 million websites around the world. We expect this number to grow rapidly. As such it’s a unique opportunity to have an enormous impact on creating a more secure and privacy-respecting Web.

In some organizations, the people responsible for deploying applications are left out of the full application development lifecycle. They are simply handed something at the end and told “make this run reliably, securely, and efficiently” while the infrastructure management role is devalued or taken for granted. That is not how we do things at Let’s Encrypt. SRE is part of the application development lifecycle from start to finish and we heavily invest in enabling and building infrastructure that is reliable, secure, and efficient. SRE is given latitude, time, and resources to do things The Right Way.

Automation is central to everything you and your team will build and maintain. You will automate operations extensively for the sake of security, scalability, correctness, compliance, and financial efficiency. You will make sure that when something does need to be done manually, it can be done in a safe and efficient manner. Our focus on automation means we are particularly interested in candidates with software engineering skills.

Our physical infrastructure includes servers, storage, switches, firewalls, and HSMs deployed across two highly secure data centers. While the majority of our infrastructure runs on our own hardware, we do use external cloud and CDN providers for some peripheral systems.

We use open source software (e.g. Linux, Prometheus, Grafana, SaltStack) extensively and prefer it when it can get the job done. The core CA application software that your team will be responsible for deploying is open source and written by our software development team.

Effective engineers know how to properly prioritize and communicate well. We will be looking for those skills in candidates.

Requirements

  • Two years professional experience as a software developer
  • An understanding of why writing tests for software is critical
  • A willingness to travel approximately three times per year
  • A willingness to be on-call (time split between six people)
  • Personal organization ability so that people can depend on you (e.g. task lists, calendar management)

Skills You Will Need to Develop

We write most of our code in Go and Python. You don’t need to know these languages coming in but you will need to learn them.

You will need to develop systems and network administration skills if you haven’t already. This means, for example, learning to manage firewalls and routers, work with automation tools like SaltStack, and manage virtual machines on both physical and cloud infrastructure.

You will need to gain domain-specific knowledge (e.g. PKI) but you don’t need to know it coming in.

Location and Benefits

This is a remote position available anywhere in the United States or Canada.

Benefits include excellent health insurance, a 100% match for 401k contributions, and flexible time off and parental leave policies.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • About Down:

    Down app is the #1 hookup / casual dating app. Honest dating: choose Date or Hookup. You can find Down in the Android or Apple store, with 6m+ users

    Our mission is to enable more honest, sex-positive, and fun relationships and conversations around the world.


    About the gig:

     Down is a 100% remotely-distributed team! 


    We are looking for a Senior Backend Engineer who is excited to share their experience building products and scaling systems. We want you to help us design and build the next generation of high performance APIs and backend services.


    This is a great opportunity to join a small and growing engineering team, where you will make a big impact daily on a product is used by hundred of thousands of people each month. 


    Here are some projects our team is currently working on: 


    Redesigning our matching algorithm

    Exploring new integrations of payment systems, including cryptocurrency

    Analyzing and improving user lifecycle and funnels

    Building community marketplace for dating discussions, advice, and personal connections 


    **What would qualify you as a good fit for us??


    You have 3+ years of work experience on backend tech (APIs, web services, and distributed systems) 

    You have experience coding professional projects in Ruby on Rails 

    You show considerable care for code quality, documentation, testing and accuracy of implementation. 

    You are comfortable being the lead or solo developer on a project 

    You design your code for scalability and performance. 

    You can reason and debate about tradeoffs and database choice for a particular storage problem. 

    You absolutely love to work with other engineers and jump at the chance to help answer questions or solve a problem for someone else.

    Clear communication and ability to own complex projects end-to-end, coordinating with other teams as necessary 


    Location of work: anywhere in this world


  • 1 month ago

    Auth0, a global leader in Identity-as-a-Service (IDaaS), provides thousands of enterprise customers with a Universal Identity Platform for their web, mobile, IoT, and internal applications. Its extensible platform seamlessly authenticates and secures more than 2.5B logins per month, making it loved by developers and trusted by global enterprises. Auth0 has raised more than $110 million to date and continues its global growth at a rapid pace. We are consistently recognized as a great place to work based our outstanding leadership and dedication to company culture, and are looking for the best people to join our incredible team spread across more than 35 countries!


    Auth0 gives companies simple, powerful and developer friendly building blocks so they can free up resources to focus on innovation. We strive to be the identity platform of choice for developers and Enterprises. We take our culture very seriously and are looking for people who are drawn to both our mission and our culture.


    The Auth0 platform processes thousands of requests per second (2.5 billion logins per month) for customers all around the world - and we're growing very fast! The Site Reliability team aims to improve reliability and uptime in a data-driven way to support our customers' needs.


    We are looking for senior software engineers with a good understanding of how systems fail, solid background in software engineering, and a desire to learn about reliability and large-scale systems.

    You are a good fit if you...

    Have initiative and can "unblock" yourself to get things done.

    Tend to deliver work incrementally to get feedback and iterate over solutions.

    Can mentor junior people and pair with other teams: education is a very important part of this role.

    Like to get your hands dirty by debugging and fixing issues in production.

    Understand the real problems by reading between the lines and asking good questions.

    Are easy to work with: you communicate well, take feedback in a positive way and are OK not always doing the most glamorous tasks.

    Responsibilities:

    Analyze and optimize our core product by developing and implementing reliability and performance practices.

    Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.

    Perform Root Cause Analysis of production issues to identify reliability improvements of our services.

    Evangelize and advocate for reliability practices across our organization

    Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

    Be on-call for services that the SRE team owns.

    Practice sustainable incident response and blameless postmortems.

    Requirements:

    You have contributed to design applications and systems that scale, are resilient to failure, and are observable.

    You are interested in designing, analyzing and troubleshooting large-scale distributed systems.

    You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.

    You have a great ability to debug and optimize code and automate routine tasks.

    You have a solid background in software development and architecting resilient and reliable applications.

    Timezone: we are giving preference to candidates located in GMT-8 to GMT+2.

    Extra Points:

    Experience with Amazon Web Services.

    Experience with Node.js or any other application development language.

    Experience with MongoDB.

    Experience working in a remote friendly, async environment.

    Preferred Locations:

    (GMT-8); (GMT-7); (GMT-6); (GMT-5); (GMT-4); (GMT-3); (GMT-2); (GMT-1); (GMT); (GMT+1); (GMT+2)

    Auth0 is an Equal Employment Opportunity employer. Auth0 conducts all employment-related activities without regard to race, religion, color, national origin, age, sex, marital status, sexual orientation, disability, citizenship status, genetics, or status as a Vietnam-era special disabled and other covered veteran status, or any other characteristic protected by law. Auth0 participates in E-Verify and will confirm work authorization for candidates residing in the United States.

  • Patreon (Selected US states)
    SRE
    2 months ago

    What you will do:

    You'll help Patreon scale the foundation of a platform that helps creators pay rent and enables higher levels of creativity.

    You'll establish a standard of high availability and reliability for Patreon's production systems.

    You'll influence the direction of our technical roadmap.

    Create and administer infrastructure -- cloud services, hosts, monitoring tools -- for highly reliable and scalable web applications and data stores.

    Build automated tooling to configure and maintain our systems and services.

    Identify and solve issues in our stack.

    Work closely with your peers in security and engineering.

    Participate in an on-call rotation ~1 week per month.

     


    Projects you might work on:


    Leveling up how we approach and handle logging.

    Improving our deploy pipeline.

    Revamp our approach to alerting.

    Working with our security team to improve the security of our infrastructure.

     


    Skills and experience you possess:


    You have experience in DevOps or Site Reliability for a company experiencing fast-paced growth.

    You are knowledgable in configuration management with a framework such as Ansible, Chef, or Puppet.

    You're comfortable with AWS, Linux, and MySQL can operate all of them from the CLI.

    You are proficient with a programming language like Python or Ruby, and with shell scripting.

    Your documentation, collaboration, and verbal communication skills are excellent.

    You are inclined to automate, but can discern when automation isn't the best solution and present alternatives.

    You've worked with continuous integration and deployment systems, and have ideas about how to build and improve those systems.

    You strongly believe in the importance of security, and enjoy the idea of partnering with the security team to ensure the integrity of our customers' data.

    You have productive habits, healthy process awareness, and good teamwork skills and instincts.

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!