Lead Software Engineer - SRE


1 month ago

02/19/2020 10:22:58

Job type: Full-time

Category: Software Development

InVision is the digital product design platform used to make the world’s best customer experiences. We provide design tools and educational resources for teams to navigate every stage of the product design process, from ideation to development. Today, more than 5 million people use InVision to create a repeatable and streamlined design workflow; rapidly design and prototype products before writing code, and collaborate across their entire organization. That includes 100% of the Fortune 100, and organizations like Airbnb, Amazon, HBO, Netflix, Slack, Starbucks and Uber, who are now able to design better products, faster.  

Our team is in search of a Lead Software Engineer - SRE to help us change the way digital products are designed.

This role will help ensure uninterrupted service for InVision customers and act as a force multiplier for product teams to deliver better software faster. This role will have ownership of foundational reliability services and a big impact on our product.

About the team:

The reliability team is dedicated ensuring resiliency at scale. You will lead design, development and delivery of solutions which to enhance the scalability, availability, and efficiency of microservices. This role is will have direct impact on platform and product teams by identifying problems, anti-patterns, and opportunities to add resilience to applications. Our tech stack includes but is not limited to Kubernetes, AWS, Kafka, Kinesis, Go and Java based microservices.

What you’ll do:

  • Provide leadership and guidance in addition to participating in hiring efforts
  • Uncover and advocate reliability, performance and upstream solutions with internal stakeholders
  • Create tools for monitoring, self-healing infrastructures
  • Code in Golang!
  • Develop solutions for circuit breaking, chaos testing, load shedding, rate limiting, server side and event bus resiliency
  • Identify performance bottlenecks and troubleshoot performance issues
  • Collaborate to problem solving and design
  • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
  • Mentor other developers and site reliability engineers in new technologies being implemented

What you’ll bring (we encourage you to apply even if you don’t meet every single one):

  • Demonstrated Leadership experience
  • Experience finding anti-patterns and engineering reliability at scale
  • 1+ years of experience with Golang
  • Good communication skills and experience leading projects
  • A degree in computer science, software engineering, or a related field, or equivalent experience
  • Systematic problem solving approach, coupled with a strong sense of ownership and drive
  • A passion for creating performant and reliable applications

About InVision:

InVision offers an incredibly unique work environment. The company employs a diverse team all over the world. Each InVision team member is given the freedom and tools to do their best work from wherever they choose.

The benefits we offer in the United States and Canada include competitive health plans and retirement plans. Some InVision-wide benefits offered to all employees across the globe include a flexible vacation policy, monthly coffee shop stipends, annual allowances for books related to your profession, and home office setup & wellness reimbursements. InVision is an international employer so some benefit offerings will vary from country to country.

InVision is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • Chainlink (Some overlap with EST)
    2 weeks ago

    Smart contracts are on track to revolutionize how all agreements work, through an entirely new system of technologically enforced contract guarantees. Chainlink enables next-generation smart contracts that can be written about any/all events in the real world, the details of our approach can be found in our whitepaper. We are well recognized for providing highly secure and reliable blockchain connectivity to the world's largest enterprises such as GoogleOracle, SWIFT, and many more. This is a unique opportunity to join one of the top companies developing cutting-edge blockchain technology while working closely together with a team of experienced senior developers.


    About this Role

    As a site reliability engineer, you’ll work directly with the company’s CTO, CEO and a technical team of other senior engineers. You’ll develop and build highly scalable, secure, and reliable software that will change the way smart contracts function at a fundamental level. You’ll have the opportunity to learn and master the latest research concerning cryptography, blockchains, game theory, consensus algorithms, and decentralized applications. We live by an open-source ethos and believe in giving back to the community. You'll join us in enabling the future architecture of Chainlink, including the following:

    • Work directly with AWS in a expert capacity using Terraform
    • Maintain reliable application and network infrastructure focusing on time to recovery, monitoring, reduced downtime during upgrades, and disaster recovery
    • Apply the 12 factor app methodology to blockchain infrastructure appropriately.
    • Use data to understand the availability, reliability, and sustainability of our service
    • Build tools and systems for a great developer user experience
    • 5+ years of professional software development

    • B.S. or higher in computer science or a similarly technical field
    • Experience with test driven development and the use of testing frameworks
    • Knowledge of system design concepts
    • Experience with distributed systems and/or container orchestration
    • Strong communication skills, specifically giving/receiving constructive feedback in a collaborative setting
    • Excitement about building, operating and maintaining resilient, scalable services

    Preferred Qualifications

    • Demonstrated understanding of container networking and security 
    • Comfort working with network protocols, proxies and load balancers
    • Experience building highly available services at scale
    • Professional experience with Golang, TypeScript, Solidity, Rust
    • Experience with distributed systems
    • Ability to optimize and refactor for scaling and/or testability
    • Experience defining security strategies and securing high value systems
    • Excitement for blockchain, Web 3.0, and similar decentralized technologies
    • Comfort with pair programming
    • Comfort working remotely in a distributed team
    • Experience with Continuous Integration and Continuous Delivery
    • Passion for open source

    This role is location agnostic anywhere in the world. Though we ask that you overlap some working hours with Eastern Standard Time (EST). 


    *Chainlink is an Equal Opportunity Employer.*

  • 5 days ago
    Company Overview

    At Netlify, we're building a platform to empower web developers to build better, more elaborate web projects than ever before. We're aiming to change the landscape of modern web development. Netlify currently serves more than 700,000 developers worldwide.

    We’re a venture-backed company, and so far we've raised about $45 million from Andreessen Horowitz, Kleiner Perkins, Bloomberg, and prominent founders and professionals in our space.

    Netlify is a diverse group of incredible talent from all over the world. We’re ~40% woman or non-binary, and are composed of about half as many nationalities as we are team members.

    About the role:

    The role breaks down into three big parts:

    • Expansion: We are rapidly hiring and we need to expand the team. You will help us continue to build a diverse and inclusive team. This involves identifying skills the team needs, shepherding candidates through the hiring process, and building a more reliable, unbiased, and fair hiring process.
    • Delivery: Balancing technical debt and new features is always nuanced. You will partner with the product management team to manage this balance in accordance with the needs of the business. You will help define a project management process to help delivery and predictability with the goal of continuously shipping code to production.
    • Cultivation: You'll be instrumental in growing the careers of the individual engineers on your team. This means being their advocate and helping guide them in the direction they want. It also means being a culture driver on the team, fostering a positive, trusting, and supportive team.

    We have a small headquarters in San Francisco but we are a largely distributed engineering team. You will need to enable the team to work productively across different timezones. Fostering good habits of documentation, empathy, integrity to delivery on committed work are some of the key elements for success on our team. Experience working and managing remote teams is a big plus.

    Ideal Candidate:
    • Experienced manager of a technical backend team, especially around infrastructure development
    • Understanding of how engineering teams collaborate and track a project delivery
    • Familiarity, or willingness to learn, about the underlying system architecture of a PaaS (e.g. APIs, databases, distributed systems)
    • Commitment to designing a hiring process that is fair, efficient, and repeatable
    • Communicate priorities and expectations to both team and leadership
    • Passion for mentorship and advocacy for your team members in their career development
    • Ability to work across multiple timezones with remote colleagues
    • A desire to succeed personally through the health and success of the team
    About the team:

    The SRE team works as a support for the whole organization, from the different engineering groups to the sales and stability of the product. While we do oversee the incident management framework, we work to minimize the number of incidents via automation and tooling for the larger engineering team. The SRE team will focus on two big areas: stability and enablement.

    Stability because the SRE team is the frontline defense around issues and outages (e.g. cloud crashes, DDoS). The team will build the tooling needed to respond to these incidents and be prepared for the unexpected. This includes a 24/7 on-call rotation and working with other teams to prepare for incidents (e.g. more observability, alerting framework)

    Enablement because the team provides some of the core services that the rest of engineering relies on (e.g. kafka, kubernetes). They also empower developers to ship their code to production in a safe and repeatable way. This means working on different developer tools to provide more observability, better testing capabilities, and a smoother deployment process. Much of this is done by writing and using tools to further automation of rote or dangerous tasks.

    Right now the team is split between US and EU timezones to provide follow the sun coverage. Often coordination will have to take into account lag, timezones, and autonomy. The goal being to have SRE resources available whenever the company needs them.

    About Netlify

    Of everything we've ever built at Netlify, we are most proud of our team.

    We believe that empowered, engaged colleagues do their best work. We’ll be giving you the tools you need to succeed and looking to you for suggestions to improve not just in your daily job, but every aspect of building a company. Whether you work from our main office in San Francisco or you are a remote employee, we’ll be working together a lot—paring, collaborating, debating, and learning. We want you to succeed! About 60% of the company are remote across the globe, the rest are in our HQ in San Francisco.

    To learn a bit more about our team and who we are, make sure to visit our about page.


    Not sure you meet 100% of our qualifications? Please apply anyway!

    With your application, please include: A thoughtful cover letter explaining why you would enjoy working in this role and why you’d like to work at Netlify. A resume or short listing of job history. (A link to a LinkedIn profile would be fine.)

    When we receive your complete application with the items above, we’ll get back to you about the next steps.

  • Aptible (North America)
    4 weeks ago
    About Aptible

    Our Vision

    We see a future where it’s easy to bring a great idea into the world using the internet, while respecting data security and privacy. The next generation of businesses will design security and privacy into their operating processes. If every business is going to be a software business, every business will need to be a security business.

    We’re working to make information security a core competency of every startup. We envision a world in which startups have access to great information security, are empowered to focus on their businesses instead of on compliance, can scale faster and more efficiently, and are confident that they're creating quality products.

    Our Team
    We wrote the Aptible Owner's Manual to help members of the company get a clear sense of what this team is — what we mean by “us.” We've now made this open to the world and invite you to read it, as a prospective member of the Aptible Team.

    Our Commitment to Diversity and Inclusion
    We prioritize diversity within our team and value different perspectives, educational backgrounds, and life experiences. We encourage people from underrepresented backgrounds to apply.

    About this Role

    We're looking for a Site Reliability Engineer to improve the infrastructure, reliability and security of our PaaS product, Aptible Deploy.

    Our next SRE will be an early member of the Aptible team. Reporting to our Customer Reliability Engineering Manager, you will be responsible for reducing the overall amount of Site Reliability work and determining an SRE roadmap.

    Our Commitment to Diversity and Inclusion
    We prioritize diversity within our team and value different perspectives, educational backgrounds, and life experiences. We encourage people from underrepresented backgrounds to apply.

    Your Impact
    • You will own and manage both internal and external tooling like PagerDuty
    • You will develop tools and processes to make monitoring, detection and issue resolution easier
    • You will prioritize and perform proactive maintenance and improvements of the entire system
    • You will help assess and remediate vulnerabilities and risks as a member of the security team
    • You will be a key member of our 24/7 oncall rotation
    You Competencies
    • You have some familiarity with one or more of the technologies that we use including: Ruby, Docker, Postgres, MySQL or Redis
    • You have experience running production environments on AWS
    • You have 3-5 years as software engineer or SRE or equivalent experience
    Our Interview Process

    We seek to make the experience of interviewing with us as delightful, efficient, fair, respectful, and transparent as possible.

    A typical process at Aptible might include the following steps, and takes approximately 3 Weeks to complete. We try to move as quickly as possible, but if you have any time constraints, please let us know and we'll do our best to accommodate.
    1) An Introduction to Aptible with the Hiring Manager (30 Minutes via Zoom)
    2) A Discussion-Based Interview with an Aptible Team Member (45-60 Minutes via Zoom)
    3) A Take-Home Work Sample Exercise (NB: You will be compensated for completing this.)
    4) A Discussion-Based Interview with an Aptible Team Member (45-60 Minutes via Zoom)

    We believe that the Work Sample Exercise is an important part of the process, in that it gives you the opportunity to demonstrate your skills in a concrete way. We take the time to design these exercises such that they: a) give you a view into the actual work you'd do at Aptible, and b) are standardized, so every candidate is evaluated using the same criteria.

    Lastly, Aptible conducts calls with 3-4 References, ideally managers who have directly supervised you in the past and/or colleagues who can speak to your work.

    If you have a disability or special need that requires accommodation, please let us know by completing this form, and we will reach out soon to see how we may be able to assist.

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!