Site Reliability Engineer

Patreon


1 month ago

04/06/2019 08:45:26

Job type: Full-time

Hiring from: Selected US states

Category: Software Dev

SRE

What you will do:

You'll help Patreon scale the foundation of a platform that helps creators pay rent and enables higher levels of creativity.

You'll establish a standard of high availability and reliability for Patreon's production systems.

You'll influence the direction of our technical roadmap.

Create and administer infrastructure -- cloud services, hosts, monitoring tools -- for highly reliable and scalable web applications and data stores.

Build automated tooling to configure and maintain our systems and services.

Identify and solve issues in our stack.

Work closely with your peers in security and engineering.

Participate in an on-call rotation ~1 week per month.

 


Projects you might work on:


Leveling up how we approach and handle logging.

Improving our deploy pipeline.

Revamp our approach to alerting.

Working with our security team to improve the security of our infrastructure.

 


Skills and experience you possess:


You have experience in DevOps or Site Reliability for a company experiencing fast-paced growth.

You are knowledgable in configuration management with a framework such as Ansible, Chef, or Puppet.

You're comfortable with AWS, Linux, and MySQL can operate all of them from the CLI.

You are proficient with a programming language like Python or Ruby, and with shell scripting.

Your documentation, collaboration, and verbal communication skills are excellent.

You are inclined to automate, but can discern when automation isn't the best solution and present alternatives.

You've worked with continuous integration and deployment systems, and have ideas about how to build and improve those systems.

You strongly believe in the importance of security, and enjoy the idea of partnering with the security team to ensure the integrity of our customers' data.

You have productive habits, healthy process awareness, and good teamwork skills and instincts.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • 3 weeks ago
    Heetch is a mobility app with a simple mission: we want people to enjoy going out.
    Every night and every day, our drivers are doing their best to make their rides unforgettable and friendly!
    We are focused on young people's expectations and are competing within a fast-paced market.
     
    The service was launched in Paris on September 2013 and has been growing since then, with thousands of rides every night in France, Belgium and Morocco.
    With more than 1 million users in Europe, we are proud to be one of the fastest French growing startups!
     
    Driver Growth @Heetch
     
    We're a thoughtful, talented, full stack and distributed product team of backend, mobile, frontend and QA engineers, as well as product managers and product designers. We're responsible for the acquisition, engagement, and retention of all our drivers ?.
    Our multi-disciplined team allows us to work autonomously across the realms of our scope. This means we own our roadmap entirely, and we empower each team member to contribute and influence what we work on and how.
     
    Our mission is quite simple; Deliver Driver happiness and ensure they get the optimum experience that they deserve. Drivers use and rely on the products we build every single day to earn a living. This is a responsibility that we hold dear and do not take for granted.
     
    SRE within Driver Growth
     
    Our infrastructure receives 2.5 millions of events per day and processes 100M of API requests. We also serve over a dozen thousand rides, have a Driver signup funnel with 50 separate Data fields and process hundred of gigabytes of log and interaction data daily. Our team owns upwards of 20 microservices on top of Elixir, Kafka and Docker, and are focussing our efforts on adding to this number as we extract from our legacy codebase.
     
    To put it simply; The services we support and the code we produce are critical to the business. Be it a potential driver going through our acquisition funnel, an active driver entering our marketplace or a driver viewing their earnings and account details to name but a few, the impact our backend engineers have on the business as a whole is enormous.
     
    Team Values
    • Transparency: We discuss everything openly within the team. Our speak up culture is strong.
    • Remote first: Our team is fully distributed, and we work hard at that, but feel free to work from any of our offices in Paris, London, Brussels or Casablanca.
    • The courage to fail: We celebrate the wins, but more importantly we're not afraid to fail, we always learn and go again.
    • Team unity: No one is left behind.
    • Code quality: It's not software without tests.
     
    Your role
    In this role, you'll be in charge of building the tools and systems that every backend engineer in the Driver Growth team uses to develop, scale, understand, and monitor their operations.
    You will dive deep into gnarly operational issues; from the software, systems, automation, and process perspectives, and, you will work with our production services throughout their entire life cycle, from design and architecture, through implementation, deployment, and sustaining operations.
     
    What will you do?
    • Build tools and infrastructure to make the team iterate faster without overthinking about the core infrastructure.
    • Partner with fellow backend engineers to architect and build mission-critical systems that can stand the test of scale and availability, while limiting operational overhead.
    • Perform deep dives into both systemic and latent reliability issues; partner with software and SRE engineers across the organization to produce and roll out fixes.
    • Design, build & support systems to detect, alert and remediate or escalate on the team' platform.
    • Contribute to standardization efforts across multiple disciplines and services in conjunction with the Core SRE team
    • Handle efficiencies in systems and processes: design, configuration management, performance tuning, monitoring, and root cause analysis.
    • Participate in an on-call rotation and contribute to needed escalation missions.
     
    What do you need?
    • Software Engineer background (+5 years)
    • Practical knowledge of various aspects of service design like messaging protocols & behavior, caching strategies and software design practices
    • Solid understanding of systems and application design, including the operational trade-offs of various designs
    • Excellent programming skills in Go, and an ability to pick up new programming languages
    • Excellent written and social communication, and documentation skills in English
    • Be adaptable and able to focus on the most straightforward, most efficient & reliable solutions
    • Experience in the Linux environment and a deep understanding of its fundamentals and internals: filesystems and modern memory management, threads and processes, the user/kernel-space divide, networking
    • Exposure to the AWS ecosystem
    • Real world experience with Packer/Terraform
    • Customer service skills and empathy to develop solutions that span multiple teams
    • Work well with and be able to influence a myriad of personalities at all levels
    Bonus
    • Experience building highly-available fault-tolerant distributed systems with microservices, including containerized architectures, application security, monitoring, and storage systems
    • Experience with message brokers (such as RabbitMQ or Kafka)
     
    Perks
    • Stocks
    • Paid conference attendance/travel
    • Heetch credits
    • A Spotify subscription
    • Code retreats and company retreats
    • Travel budget (visit your remote co-workers and our offices)
    Hiring process:
    • Non technical interview with the Engineering Manager of your potential team (1h30)
    • Take home assignment (~5 days deadline)
    • Interview with your future teammates (1h)
    • Day on site (Paris) to meet your future stakeholders
     
     
    Check out our Engineering Blog and follow our twitter :)
    You can also have a look at our open-source projects and contributions here
  • Noredink (PST to CET)
    1 week ago

    NoRedInk is using technology to help millions of students become better writers. We’re seeking mission-driven engineers who like to ship code, tackle hard engineering problems, and fundamentally impact how kids learn.


    We’re hiring a site reliability engineer to handle availability and scalability, as well as product development. When students hit our site, you will help make sure there's a site to hit.


    About You

    You have at least 4 years of professional experience as a software developer or equivalent knowledge

    You have professional experience administering Linux servers with configuration management tools

    You have experience scaling with large deployments on AWS or bare metal

    You have experience supporting production stack for a web application. We use Rails, Redis and MySQL.

    You can be your own DBA including setup, optimization and troubleshooting

    You are comfortable either working remotely, or commuting to our office in San Francisco

    Experience with Docker, microservices and/or security a plus 

    What are we up to?

    To see what our engineering team has been doing lately, check out our blog!

    NoRedInk helps millions of students in grades 5-12 become better writers. Our adaptive curriculum guides learners through a continuous process of skill-building, feedback, and revision and delivers actionable performance data to teachers and administrators. Used in over 50% of school districts, we're on a mission to unlock every writer's potential. Here’s a 2-minute pitch we gave on NBC and articles about us in The Washington Post, Wall Street Journal, and Forbes.

  • 2 weeks ago

    Auth0, a global leader in Identity-as-a-Service (IDaaS), provides thousands of enterprise customers with a Universal Identity Platform for their web, mobile, IoT, and internal applications. Its extensible platform seamlessly authenticates and secures more than 2.5B logins per month, making it loved by developers and trusted by global enterprises. Auth0 has raised more than $110 million to date and continues its global growth at a rapid pace. We are consistently recognized as a great place to work based our outstanding leadership and dedication to company culture, and are looking for the best people to join our incredible team spread across more than 35 countries!


    Auth0 gives companies simple, powerful and developer friendly building blocks so they can free up resources to focus on innovation. We strive to be the identity platform of choice for developers and Enterprises. We take our culture very seriously and are looking for people who are drawn to both our mission and our culture.


    The Auth0 platform processes thousands of requests per second (2.5 billion logins per month) for customers all around the world - and we're growing very fast! The Site Reliability team aims to improve reliability and uptime in a data-driven way to support our customers' needs.


    We are looking for senior software engineers with a good understanding of how systems fail, solid background in software engineering, and a desire to learn about reliability and large-scale systems.

    You are a good fit if you...

    Have initiative and can "unblock" yourself to get things done.

    Tend to deliver work incrementally to get feedback and iterate over solutions.

    Can mentor junior people and pair with other teams: education is a very important part of this role.

    Like to get your hands dirty by debugging and fixing issues in production.

    Understand the real problems by reading between the lines and asking good questions.

    Are easy to work with: you communicate well, take feedback in a positive way and are OK not always doing the most glamorous tasks.

    Responsibilities:

    Analyze and optimize our core product by developing and implementing reliability and performance practices.

    Scale systems sustainably through automation, and evolve systems by pushing for changes that improve reliability and velocity.

    Perform Root Cause Analysis of production issues to identify reliability improvements of our services.

    Evangelize and advocate for reliability practices across our organization

    Collaborate with other Engineering teams to support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews.

    Be on-call for services that the SRE team owns.

    Practice sustainable incident response and blameless postmortems.

    Requirements:

    You have contributed to design applications and systems that scale, are resilient to failure, and are observable.

    You are interested in designing, analyzing and troubleshooting large-scale distributed systems.

    You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.

    You have a great ability to debug and optimize code and automate routine tasks.

    You have a solid background in software development and architecting resilient and reliable applications.

    Timezone: we are giving preference to candidates located in GMT-8 to GMT+2.

    Extra Points:

    Experience with Amazon Web Services.

    Experience with Node.js or any other application development language.

    Experience with MongoDB.

    Experience working in a remote friendly, async environment.

    Preferred Locations:

    (GMT-8); (GMT-7); (GMT-6); (GMT-5); (GMT-4); (GMT-3); (GMT-2); (GMT-1); (GMT); (GMT+1); (GMT+2)

    Auth0 is an Equal Employment Opportunity employer. Auth0 conducts all employment-related activities without regard to race, religion, color, national origin, age, sex, marital status, sexual orientation, disability, citizenship status, genetics, or status as a Vietnam-era special disabled and other covered veteran status, or any other characteristic protected by law. Auth0 participates in E-Verify and will confirm work authorization for candidates residing in the United States.

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!