Site Reliability Engineer

ISRG


1 week ago

04/11/2019 14:25:18

Job type: Full-time

Hiring from: North America

Category: Software Dev

SRE

What You Will Do

You will be a member of a six-person Site Reliability Engineering (SRE) team responsible for maintaining and evolving the operational infrastructure for the Let’s Encrypt certificate authority. You will work closely with our application software developers and management to plan and implement the future of the certificate authority, its software applications, and its policies and procedures.

We provide secure and reliable service to more than 150 million websites around the world. We expect this number to grow rapidly. As such it’s a unique opportunity to have an enormous impact on creating a more secure and privacy-respecting Web.

In some organizations, the people responsible for deploying applications are left out of the full application development lifecycle. They are simply handed something at the end and told “make this run reliably, securely, and efficiently” while the infrastructure management role is devalued or taken for granted. That is not how we do things at Let’s Encrypt. SRE is part of the application development lifecycle from start to finish and we heavily invest in enabling and building infrastructure that is reliable, secure, and efficient. SRE is given latitude, time, and resources to do things The Right Way.

Automation is central to everything you and your team will build and maintain. You will automate operations extensively for the sake of security, scalability, correctness, compliance, and financial efficiency. You will make sure that when something does need to be done manually, it can be done in a safe and efficient manner. Our focus on automation means we are particularly interested in candidates with software engineering skills.

Our physical infrastructure includes servers, storage, switches, firewalls, and HSMs deployed across two highly secure data centers. While the majority of our infrastructure runs on our own hardware, we do use external cloud and CDN providers for some peripheral systems.

We use open source software (e.g. Linux, Prometheus, Grafana, SaltStack) extensively and prefer it when it can get the job done. The core CA application software that your team will be responsible for deploying is open source and written by our software development team.

Effective engineers know how to properly prioritize and communicate well. We will be looking for those skills in candidates.

Requirements

  • Two years professional experience as a software developer
  • An understanding of why writing tests for software is critical
  • A willingness to travel approximately three times per year
  • A willingness to be on-call (time split between six people)
  • Personal organization ability so that people can depend on you (e.g. task lists, calendar management)

Skills You Will Need to Develop

We write most of our code in Go and Python. You don’t need to know these languages coming in but you will need to learn them.

You will need to develop systems and network administration skills if you haven’t already. This means, for example, learning to manage firewalls and routers, work with automation tools like SaltStack, and manage virtual machines on both physical and cloud infrastructure.

You will need to gain domain-specific knowledge (e.g. PKI) but you don’t need to know it coming in.

Location and Benefits

This is a remote position available anywhere in the United States or Canada.

Benefits include excellent health insurance, a 100% match for 401k contributions, and flexible time off and parental leave policies.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • Kyokan | Application Engineers | REMOTE | Full-time | kyokan.io


    Over the past 18 months, Kyokan has been helping lay the groundwork for global transformations in the ways people transact, bank, raise, share, transfer, invest, coordinate and distribute wealth and value. We have done so through our collaborations with leading blockchain projects, including MetaMask, EthereumJs, Geth, Ethereum 2.0, MolochDAO and Filecoin/Protocol Labs.


    We are seeking engineers who will bring ambition, tenacity and initiative to some of the most important FOSS projects in the blockchain space while helping build a world-changing company.


    By joining Kyokan, you commit to:

    • hit the ground running from day 0 as a full-time contributor to one of our open-source partners
    • bringing vision, creativity, hussle and technical expertise to mission critical technical challenges
    • provide ongoing informal leadership, and daily inititative, to help your colleagues and community (blockchain developers, contributors and users) change the world via the development of the decentralized web
    • pursuing constant enhancement of your skills, workflows and impact while supporting your teammates and teams to do the same
    • respect, patience and empathy for every interaction/relation with coworkers and the community

    We are hiring full-stack engineers who specialize in JavaScript. We will likely hire more than one, with at least one focused on the backend and at least one focused on the front end.


    We are a fully remote company.


    To apply:

    1. email [email protected], with "Application Engineer" in the subject line
    2. include a resume, links to github, twitter, LinkedIn, your blog, etc.
    3. tell us what you are looking for from this role and what you intend to bring to it

    For a more in-depth description of this opportunity: https://gist.github.com/danjm/b1e5ee2b0de997ab5e9f8d5b7a757334

  • Heetch is a mobility app with a simple mission: we want people to enjoy going out.
    Every night and every day, our drivers are doing their best to make their rides unforgettable and friendly!
    We are focused on young people's expectations and are competing within a fast-paced market.
     
    The service was launched in Paris on September 2013 and has been growing since then, with thousands of rides every night in France, Belgium and Morocco.
    With more than 1 million users in Europe, we are proud to be one of the fastest French growing startups!
     
    Driver Growth @Heetch
     
    We're a thoughtful, talented, full stack and distributed product team of backend, mobile, frontend and QA engineers, as well as product managers and product designers. We're responsible for the acquisition, engagement, and retention of all our drivers ?.
    Our multi-disciplined team allows us to work autonomously across the realms of our scope. This means we own our roadmap entirely, and we empower each team member to contribute and influence what we work on and how.
     
    Our mission is quite simple; Deliver Driver happiness and ensure they get the optimum experience that they deserve. Drivers use and rely on the products we build every single day to earn a living. This is a responsibility that we hold dear and do not take for granted.
     
    SRE within Driver Growth
     
    Our infrastructure receives 2.5 millions of events per day and processes 100M of API requests. We also serve over a dozen thousand rides, have a Driver signup funnel with 50 separate Data fields and process hundred of gigabytes of log and interaction data daily. Our team owns upwards of 20 microservices on top of Elixir, Kafka and Docker, and are focussing our efforts on adding to this number as we extract from our legacy codebase.
     
    To put it simply; The services we support and the code we produce are critical to the business. Be it a potential driver going through our acquisition funnel, an active driver entering our marketplace or a driver viewing their earnings and account details to name but a few, the impact our backend engineers have on the business as a whole is enormous.
     
    Team Values
    • Transparency: We discuss everything openly within the team. Our speak up culture is strong.
    • Remote first: Our team is fully distributed, and we work hard at that, but feel free to work from any of our offices in Paris, London, Brussels or Casablanca.
    • The courage to fail: We celebrate the wins, but more importantly we're not afraid to fail, we always learn and go again.
    • Team unity: No one is left behind.
    • Code quality: It's not software without tests.
     
    Your role
    In this role, you'll be in charge of building the tools and systems that every backend engineer in the Driver Growth team uses to develop, scale, understand, and monitor their operations.
    You will dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives, and, you will work with our production services throughout their entire life cycle, from design and architecture, through implementation, deployment, and sustaining operations.
     
    What will you do?
    • Build tools and infrastructure to make the team iterate faster without overthinking about the core infrastructure.
    • Partner with fellow backend engineers to architect and build mission-critical systems that can stand the test of scale and availability, while limiting operational overhead.
    • Perform deep dives into both systemic and latent reliability issues; partner with software and SRE engineers across the organization to produce and roll out fixes.
    • Design, build & support systems to detect, alert and remediate or escalate on the team' platform.
    • Contribute to standardization efforts across multiple disciplines and services in conjunction with the Core SRE team
    • Handle efficiencies in systems and processes: design, configuration management, performance tuning, monitoring, and root cause analysis.
    • Participate in an on-call rotation and contribute to needed escalation missions.
     
    What do you need?
    • Software Engineer background (+5 years)
    • Practical knowledge of various aspects of service design like messaging protocols & behavior, caching strategies and software design practices
    • Solid understanding of systems and application design, including the operational trade-offs of various designs
    • Excellent programming skills in Go, and an ability to pick up new programming languages
    • Excellent written and social communication, and documentation skills in English
    • Be adaptable and able to focus on the most straightforward, most efficient & reliable solutions
    • Experience in the Linux environment and a deep understanding of its fundamentals and internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, networking
    • Exposure to the AWS ecosystem
    • Real world experience with Packer/Terraform
    • Customer service skills and empathy to develop solutions that span multiple teams
    • Work well with and be able to influence a myriad of personalities at all levels
    Bonus
    • Experience building highly-available fault-tolerant distributed systems with microservices, including containerized architectures, application security, monitoring, and storage systems
    • Experience with message brokers (such as RabbitMQ or Kafka)
     
    Perks
    • Stocks
    • Paid conference attendance/travel
    • Heetch credits
    • A Spotify subscription
    • Code retreats and company retreats
    • Travel budget (visit your remote co-workers and our offices)
    Hiring process:
    • Non technical interview with the Engineering Manager of your potential team (1h30)
    • Take home assignment (~5 days deadline)
    • Interview with your future teammates (1h)
    • Day on site (Paris) to meet your future stakeholders
     
     
    Check out our Engineering Blog and follow our twitter :)
    You can also have a look at our open-source projects and contributions here
  • Patreon (Selected US states)
    SRE
    1 week ago

    What you will do:

    You'll help Patreon scale the foundation of a platform that helps creators pay rent and enables higher levels of creativity.

    You'll establish a standard of high availability and reliability for Patreon's production systems.

    You'll influence the direction of our technical roadmap.

    Create and administer infrastructure -- cloud services, hosts, monitoring tools -- for highly reliable and scalable web applications and data stores.

    Build automated tooling to configure and maintain our systems and services.

    Identify and solve issues in our stack.

    Work closely with your peers in security and engineering.

    Participate in an on-call rotation ~1 week per month.

     


    Projects you might work on:


    Leveling up how we approach and handle logging.

    Improving our deploy pipeline.

    Revamp our approach to alerting.

    Working with our security team to improve the security of our infrastructure.

     


    Skills and experience you possess:


    You have experience in DevOps or Site Reliability for a company experiencing fast-paced growth.

    You are knowledgable in configuration management with a framework such as Ansible, Chef, or Puppet.

    You're comfortable with AWS, Linux, and MySQL can operate all of them from the CLI.

    You are proficient with a programming language like Python or Ruby, and with shell scripting.

    Your documentation, collaboration, and verbal communication skills are excellent.

    You are inclined to automate, but can discern when automation isn't the best solution and present alternatives.

    You've worked with continuous integration and deployment systems, and have ideas about how to build and improve those systems.

    You strongly believe in the importance of security, and enjoy the idea of partnering with the security team to ensure the integrity of our customers' data.

    You have productive habits, healthy process awareness, and good teamwork skills and instincts.

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!