Site Reliability Engineer

Sketch


6 days ago

02/11/2020 10:22:57

Job type: Full-time

Category: Software Development


Do you want to be part of a team that helps over one million designers create amazing products every day? We're looking for a full-time Site Reliability Engineer to join us at Sketch.

We are building a cloud platform that helps teams to collaborate on Sketch designs in every possible, efficient, and beautiful way.
Your mission will be to shape this cloud infrastructure defining and building every piece, from development environments to metrics processing and observability, including security policies, network design, deployment strategies, high availability, etc...

Our stack is currently based on a mix of serverless and traditional server applications. You will propose new projects to make sure this platform has the best technology for our product goals and our team. You are proactive and have a "get the job done" attitude. You are also not afraid of getting deeper and deeper in order to debug a problem, especially in production.

There are always many things to do at Sketch. You need to be an organized and communicative person. You are used to prioritizing Infrastructure tasks and projects and you like to back your decisions and proposals with arguments. As a part of a team with very skilled people being an excellent team player is essential.

As a remote organization
There are three keys to us. It requires excellent communication skills as well as good written and spoken English. You need to be self-motivated and be comfortable working in a remote position. And also it requires high-quality documentation. You to have an eye for detail, in general, and especially for the documentation.

We believe in
Automated, simple, and quality tested infrastructures. It's essential that you have experience developing infrastructures as code and you enjoy coding. You are very critic with your own job and you always try to find the cleanest way to do it. You understand well the right balance between adopting new technology, current stability, maintainability, and simplicity. Like us, you also believe that speed and reliability are two of the most important web platforms features. You like to design and build processes and platforms that run flawlessly and fast.

The ideal candidate
  • Has experience with different stacks (mainly Linux based), technologies and production models and has participated actively on the build of important pieces of a cloud platform.
  • We would like to know as much about you as possible. Contact us and tell us about your experience and your motivations for this job and send us any link of something that represents you or your experience.
Even if you feel you are not 100% exactly the person described, we would still love to hear from you. We value anything that makes you different from the description.
Even if you're not able to tick all of these boxes, we would still love to hear from you.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • 3 days ago

    Job Opening: Backend Platform Engineer

    We’re looking for someone to join our Platform Engineering team at Ferrum. Are you interested in building services to help hospitals detect and eliminate the millions of medical errors that occur every year? If so, read on.


    About You

    You have experience solving challenges with microservices and scaling systems securely. You are comfortable building robust applications with Go or similar languages, Docker, PostgreSQL, Linux, and more in complex systems. Your significant experience interacting with and developing APIs provides a framework for creating both internal and external facing endpoints. 

    You love efficiency and automation. Your work at Ferrum will have a huge impact on the business. You understand that automation and infrastructure pays dividends. You take pride in creating tools to help the team perform at a high level and scale to new challenges.

    You communicate and document everything. At Ferrum, we are a distributed remote team. A culture of sharing and documentation allows everyone to seamlessly work together. Further, for FDA and other healthcare regulatory approvals our documentation of development, testing, and validation must be airtight. 

    You take ownership of a project from ideation to delivery and maintenance. Your experience provides a framework for you to work independently on multiple initiatives as both the end-to-end owner and as a contributor to features. You take pride in creating services that are easy to debug and maintain.

    You want to make a positive and lasting impact on the world. You understand that technology has the power to improve people’s lives and enrich our society. You see the inequality in your own community and around the world and you look to change it. 


    What You Will Do

    Ferrum is a distributed, fast-growing company so you will be wearing many hats and pitch in on different components and projects across the organization. That said, here are some examples of what you’ll do:

    • Scale up the platform and integrations between machine learning algorithms and the pipeline

    • Design and build the APIs interfacing the data pipeline and machine learning services

    • Building security tools to protect sensitive patient data throughout the data pipeline

    • Optimize bare-metal appliances to meet the high performance needs of the application

    • Provision infrastructure for the secure services coordinating on-premise and cloud-hosted services


    About Ferrum

    Medical errors kill 6 million patients every year and are the third leading cause of death worldwide. Ferrum provides doctors with an automated quality management system and machine learning marketplace that ensures they catch and fix medical errors before they affect patient care. The service does not affect physician workflow, takes less than a day to install, and is delivered via a secure appliance so patient data never leaves the hospital. Ferrum has been deployed at hospitals in multiple countries around the world. We are a highly technical team led by experienced founders who have built, funded, and scaled successful healthcare technology companies previously. 

    Salary: $95,000


    To apply to this position, please send an email to [email protected]

  • Helpscout (Remote)
    3 weeks ago
    As a member of our Ops team, you will be at the heart of nearly every application, tool, and service at Help Scout. The work you do everyday will reflect the team mission: Ensure uptime and security across all of our applications while developing and supporting tools to enable customer bliss.

    To help us with our mission, we are seeking an experienced Ops Engineer to join our team. You will have a direct impact on Help Scout’s success, while helping more than 10,000 businesses around the world. While customers love our product, it means nothing if they can't access our services with great performance.


    Technologies we work with
    • AWS, Linux (Ubuntu/CentOS), Chef, Git/Github, RabbitMQ, AWS Aurora MySQL & PostgreSQL, MongoDB, Redis, Jenkins, Docker/Compose, New Relic, Sensu, PagerDuty, Ruby, Go, Python, Java, and PHP.


    About the role
    • You’ll be working on a small team of six (that includes one of our co-founders) and in collaboration with our software developers to build, deploy, secure, manage, and optimize highly-available, fault-tolerant, and horizontally scalable systems in AWS.
    • Ideally, we are looking to add a team member in the North or South American timezones.
    • Our engineering teams communicate mostly via Slack and are committed to remote, agile development. When your code is ready, you’ll create and send a pull request with test cases and tag your team for review. 
    • We are investing heavily in continuous integration and delivery and strive to uphold immutable infrastructure standards. 
    • You’ll work autonomously for the most part and we trust you to get work done when/where you can be productive.
    • In order to ensure excellent service to our customers, you will be part of our rotating on-call team.


    A note about on-call
    • The 6-week rotation follows this format: 1 week on backup on-call(which rarely sees much action), 1 week of being on-call, followed by a 4 week hiatus from on-call.
    • Our on-call shift is not particularly wearisome, but as a thank you for carrying the weight for the week, the day following your shift is a free day off if you want to take it. We want you happy, healthy and well-rested!


    About you
    • You have a growth mindset, a passion for learning, and are willing to lean into discomfort for the good of our customers and product. 
    • You became an engineer because you like building systems, tools or products that help people.
    • You write code and scripts that other engineers can easily read and understand and you welcome reviews and feedback from your peers. You are comfortable writing tests and you thoroughly verify your work before you deploy. 
    • You’re a great communicator and have an excellent command of written and spoken English.As a remote company, we rely on clear communication for collaboration and execution. 
    • You believe remote teams are the future of work, or are at least excited about the idea. You have experience working with remote teams or can adjust your work and time-management style to be remote-friendly.
    • You are helpful and empathetic and care about building on our company culture that embraces these qualities.
    • You have a deep understanding of what it takes to run SaaS at scale and have a solid understanding of Linux systems and networking; from kernel to shell, system libraries, file systems and client-server protocols.
    • You are proficient and comfortable in the AWS ecosystem.
    • You are adept at automating service and infrastructure configuration via industry standard tools (E.g. Chef, Terraform).
    • You have experience building continuous deployment and testing tools. (Docker, ECS, EKS, Kubernetes)
    • You design and build systems that work well and fail gracefully.
    • Security engineering is near and dear to your heart; you build with and advocate for a security mindset when implementing new features and infrastructure.
    • You have experience working with MTAs (e.g exim, postfix) and SPAM filtering (e.g. rspamd, SpamAssassin)
    Benefits
    Competitive salary - Our salary formula is public to all employees (but doesn't divulge your specific salary) and we update it at least once per year. Your salary is the same no matter where you live. Our goal is to pay at or above the market rate of a US-based tech hub like Boston or Seattle.

    Health and dental insurance - We cover you and your family's health/dental insurance 100%. If you are based in the US, we'll cover you on our Aetna policy. If you're based outside the US, we'll reimburse your out-of-pocket health and dental insurance costs.

    Long-term/short-term disability insurance & life insurance - we cover 100% of the premiums for LT/ST disability insurance and base life insurance. You also have the option to purchase supplementary life insurance through our provider (currently US only).

    Flexible vacation - Take time off when you need it! We recommend 3-4 weeks in addition to public holidays, but there are no firm rules. We trust you.

    Sabbatical - After you've been at Help Scout for 4 years, you get a month of paid vacation (in addition to regular vacation) and $2,500 to spend towards travel, learning, projects or anything else during your time off. Read about what our CEO did.

    Paid parental leave, including adoption - 12 weeks of paid leave for all new parents.

    401k with 1% match- via Betterment for Business (currently US only)

    Personal Development stipend - Up to $1,800 per year to improve your craft

    Great tools - Each employee will be provided with a Mac laptop and display (or equivalent equipment of choice). We’ll also purchase any additional software or hardware you need.

    Home office stipend - Every new hire gets $1,500 USD to furnish their home office, and up to $350 USD per month if you'd like to rent a co-working desk somewhere.

    Complete transparency - Everyone has full access to business metrics and financial information about the company.

    About Us
    Help Scout is made by roughly 110 people in 80+ cities around the world, all with a passion for helping others. We come from diverse backgrounds and are united by an enthusiasm for great products and delightful customer experiences. Help Scout launched in 2011 and today we have more than 10,000 paying customers in 140+ countries.

    Why Help Scout?
    We're remote. It doesn’t matter if you’ve worked remotely before — we’ve been doing it for nearly a decade and are helping to write the playbook — we’re happy to show you the ropes. Most folks that get a taste of working in a "remote first" company have a hard time going back to the old way of doing things.

    We’re passionate about diversity and inclusion. The data is abundantly clear about diverse teams being more successful, and we're dedicated to setting the team up for success. Today our leadership team is 62% women, and that's just the start. Here’s our 2019 report. 

    We're committed to SMBs for the long term. Help Scout is focused entirely on serving small and midsize businesses, typically up to 500 employees, because those companies view customer service differently. It's not a cost to be optimized, it's their most effective marketing tool and a key differentiator from the competition. We built Help Scout for companies that truly value being customer-centric (like us) and want a product that shares their values.

    We're leaving the world better than we found it. Did you know Help Scout is a certified B Corporation, with a mission to give away at least 1% of our product through Help Scout for Good? Our company exists not just to help ourselves, but to invest in our team, our customers, our community, and our environment.

    Our commitment to you
    We are an equal opportunity employer and are committed to building a company that embraces and celebrates diversity and inclusion. We do not discriminate on the basis of race, religion, color, national origin, gender, gender identity or expression, sexual orientation, age, marital status, veteran status, or disability status. We have read the studies and understand that diverse teams build better products, bring more perspective to the table, contribute to a company’s financial success and help foster a more inclusive environment for all employees, but the bottom line is that it's the right thing to do.
  • Description
    As a well rounded systems reliability engineer with a diverse set of skills, this makes you one of the very best people to troubleshoot, monitor the platform, and be on top of releases. You should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level! A typical day might include these types of activities:

    - Taking charge of the build process and pipelines across the platform.
    - Being keenly aware of systems architecture and automatically adding in redundancy and backup for new systems and software.
    - Assist in troubleshooting a complex customer issues across network devices, server hardware, virtual machines, in-house software and open source software. Not only can you run tcpdump with filters on the command line, but you can read it there also.
    - Adding additional monitoring and alerting on all systems across the platform that will help you identify one of those annoying intermittent issues you have seen in the logs.


    Skills & Requirements
    The right candidates will probably have a CS degree, solid scripting and automation skills, great troubleshooting skills across the OS and network, a good grasp on security concepts, experience with routing platforms and protocols, and enjoy working collaboratively.

    Specific requirements include:

    - Experience in automating tasks through scripting. You should be very well versed with Python, and probably a few other languages. We will ask for script samples.
    - High degree of drive to improve and automate your environment with minimal guidance
    - Be able to solve for immediate, and plan to accommodate for future problems
    - Experience with Ansible, Salt, Chef, Puppet, Terraform, or CFEngine. Experience with Ansible and Terraform preferred.
    - Experience with build pipelines, integration testing and Jenkins.
    - Experience administering a wide variety of *nix platforms, including multiple Linux variants.
    - Solid understanding of Layer 2 and Layer 3 protocols including IPv4/6, 802.1Q, BGP, MPLS, etc., and understanding a multitude of different network architectures.
    - Experience with Google Compute, AWS, or other cloud based compute and database services.
    - Understand the importance and implementation of backup and redundancy across many layers of databases, systems, and network configurations.

    Some knowledge that would be a huge plus:

    - Familiarity administering/troubleshooting Juniper/Cisco/Arista platforms.
    - Experience with extremely large scale network management and monitoring.
    - Experience with Postgresql, TimescaleDB, ElasticSearch

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!