Remote site reliability Jobs in February 2020

9 Remote site reliability Jobs in February 2020

Post a job
  • Software Development (9)

    • Bold Penguin (Eastern Time +/- 2 hours)
      Today

      We didn’t create Bold Penguin because commercial insurance is broken. It isn’t. But as the world has gotten more connected and digitized, commercial insurance lags behind—creating a fragmented landscape where businesses, agents, and insurance companies struggle to interact in a smooth and easy way. That’s why we’ve built a highly efficient exchange that cuts the friction out of commercial insurance by connecting everyone to the right quote in record time.

      Powering the world of insurance is no small feat, so we’ve brought on a team that's not only incredibly talented but also passionate about our potential to upgrade the entire industry. As more and more companies big and small depend on our technology to operate in the commercial insurance space, we’ll need the best talent all around to support our growth. That’s why we’re looking at you (yes, you!) to make a bold move and join our adventure.

      Your  Role

      As a Cloud & Site Reliability Engineer, you will be a subject matter expert in building highly reliable, highly scalable features and infrastructure. You’ll use DevOps principles to ensure that Bold Penguin’s software systems are always available and ready to scale to meet growing demands. 

      Click here to learn more about DevOps on the glacier

      What You’ll Do

      • Ensure the reliability, performance, and availability of our platform by working as part of a cross-functional product team
      • Participate in agile ceremonies such as iteration planning, retrospective, and daily standups
      • Be part of the shared on-call rotation and proactively research possible issues affected the availability of our platform
      • Understand and clearly articulate tradeoffs in architecture decisions with regards to cost, security, operational efficiencies, performance, and availability
      • Build and maintain infrastructure with executable code (IaC) and automated delivery pipelines
      • Be passionate about Cloud/DevOps/SRE concepts such as Immutable Infrastructure, Cattle vs Pets, Infrastructure as Code, Delivery Pipelines

      Skills & Qualifications

      • Deep, hands-on expertise with AWS Cloudformation and other Infrastructure as Code tools
      • Experience with Amazon Web Services; specifically EC2, ECS, ELB, CodePipeline, RDS, Redshift, S3, IAM, and Lambda
      • Ability to articulate Cloud & DevOps concepts to a variety of technical & non-technical team members
      • Bonus points for expertise in implementing security & compliance frameworks such as SOC/2, NIST 800-53, and NIST 800-171 especially in Amazon Web Services
      • Bonus points for AWS Certifications 
      • Bonus points for familiarity with microservices architectures, Ruby on Rails and/or ETL tools such as Fivetran.
      • Experience working at technology companies and startups desirable
      • 2-4 years + of working remote, full time, and/or with full time co-located teams across different time zones.

      BONUS POINTS

      • Full-stack expertise in multiple tiers of modern web applications (e.g. front end, back end, infrastructure, etc.)
      • Open-source contributions and/or speaking experience.
      • Previous work experience in insurance and/or experience with policy rating very desirable.
      • You love Penguins! ;P

      TRAVEL TO THE "GLACIER" (please read)

      • We are firm proponents of "seeing eye to eye by meeting face to face". As such, our remote team travels in once a quarter for a full day of collaboration, goal setting, team building, etc.  Are you able to make this work?  In addition to this we also ask that, if hired, you are able to make the first week onsite for onboarding/training. 

      PENGUIN PERKS

      • For a healthy colony.
        • Our plan covers 50% of your Medical Premiums – Health - HRA, Dental, Vision, and Life Insurance, as well as Short & Long Term Disability (Trust us, the benefits are great!
      • Penguins plan for the future.
        • 401k Match program, up to 4%! 
      • Parental Leave
        • 16 weeks of parental leave (your kids need you there!)
      • Need a vacation?
        • Unlimited PTO - Please take a vacation - you need it and we applaud it and in fact we require you take 10 days off!
      • Hungry? Thirsty?
        • We offer free snacks and drinks, as well as catered lunch every Monday (even to our remote employees...nomb nomb nomb)
      • Penguins need to learn!
        • We support your professional growth. Certifications, training, memberships, and conferences are actively encouraged—and often covered.
      • Penguins are social creatures and love to play!
        • We have frequent happy hours, company events, and outings. What kind of company would we be if we didn't have some fun!?!? 
      • Penguins give back.
        • We offer volunteer opportunities every month!  There is no better feeling than giving back =)
      • Don’t want to move to Columbus?
        • We offer up to 100% remote engineers!
        • You must be OK visiting the office for a day or two every quarter - we are all about that camaraderie! 

      Penguins believe in inclusion. That’s why we’re proud to be an equal opportunity employer that considers all qualified applicants regardless of race, color, religion, gender identity or expression, sexual orientation, national origin, genetics, disability, age, veteran status, beak size, or inability to fly.

    • InVision is the digital product design platform used to make the world’s best customer experiences. We provide design tools and educational resources for teams to navigate every stage of the product design process, from ideation to development. Today, more than 5 million people use InVision to create a repeatable and streamlined design workflow; rapidly design and prototype products before writing code, and collaborate across their entire organization. That includes 100% of the Fortune 100, and organizations like Airbnb, Amazon, HBO, Netflix, Slack, Starbucks and Uber, who are now able to design better products, faster.  

      Our team is in search of a Lead Software Engineer - SRE to help us change the way digital products are designed.

      This role will help ensure uninterrupted service for InVision customers and act as a force multiplier for product teams to deliver better software faster. This role will have ownership of foundational reliability services and a big impact on our product.

      About the team:

      The reliability team is dedicated ensuring resiliency at scale. You will lead design, development and delivery of solutions which to enhance the scalability, availability, and efficiency of microservices. This role is will have direct impact on platform and product teams by identifying problems, anti-patterns, and opportunities to add resilience to applications. Our tech stack includes but is not limited to Kubernetes, AWS, Kafka, Kinesis, Go and Java based microservices.

      What you’ll do:

      • Provide leadership and guidance in addition to participating in hiring efforts
      • Uncover and advocate reliability, performance and upstream solutions with internal stakeholders
      • Create tools for monitoring, self-healing infrastructures
      • Code in Golang!
      • Develop solutions for circuit breaking, chaos testing, load shedding, rate limiting, server side and event bus resiliency
      • Identify performance bottlenecks and troubleshoot performance issues
      • Collaborate to problem solving and design
      • Engage in service capacity planning and demand forecasting, software performance analysis and system tuning
      • Mentor other developers and site reliability engineers in new technologies being implemented

      What you’ll bring (we encourage you to apply even if you don’t meet every single one):

      • Demonstrated Leadership experience
      • Experience finding anti-patterns and engineering reliability at scale
      • 1+ years of experience with Golang
      • Good communication skills and experience leading projects
      • A degree in computer science, software engineering, or a related field, or equivalent experience
      • Systematic problem solving approach, coupled with a strong sense of ownership and drive
      • A passion for creating performant and reliable applications

      About InVision:

      InVision offers an incredibly unique work environment. The company employs a diverse team all over the world. Each InVision team member is given the freedom and tools to do their best work from wherever they choose.

      The benefits we offer in the United States and Canada include competitive health plans and retirement plans. Some InVision-wide benefits offered to all employees across the globe include a flexible vacation policy, monthly coffee shop stipends, annual allowances for books related to your profession, and home office setup & wellness reimbursements. InVision is an international employer so some benefit offerings will vary from country to country.

      InVision is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, ancestry, religion, sex, national origin, sexual orientation, age, citizenship, marital status, disability, gender identity or Veteran status. If you have a disability or special need that requires accommodation, please let us know.

    • Do you want to be part of a team that helps over one million designers create amazing products every day? We're looking for a full-time Site Reliability Engineer to join us at Sketch.

      We are building a cloud platform that helps teams to collaborate on Sketch designs in every possible, efficient, and beautiful way.
      Your mission will be to shape this cloud infrastructure defining and building every piece, from development environments to metrics processing and observability, including security policies, network design, deployment strategies, high availability, etc...

      Our stack is currently based on a mix of serverless and traditional server applications. You will propose new projects to make sure this platform has the best technology for our product goals and our team. You are proactive and have a "get the job done" attitude. You are also not afraid of getting deeper and deeper in order to debug a problem, especially in production.

      There are always many things to do at Sketch. You need to be an organized and communicative person. You are used to prioritizing Infrastructure tasks and projects and you like to back your decisions and proposals with arguments. As a part of a team with very skilled people being an excellent team player is essential.

      As a remote organization
      There are three keys to us. It requires excellent communication skills as well as good written and spoken English. You need to be self-motivated and be comfortable working in a remote position. And also it requires high-quality documentation. You to have an eye for detail, in general, and especially for the documentation.

      We believe in
      Automated, simple, and quality tested infrastructures. It's essential that you have experience developing infrastructures as code and you enjoy coding. You are very critic with your own job and you always try to find the cleanest way to do it. You understand well the right balance between adopting new technology, current stability, maintainability, and simplicity. Like us, you also believe that speed and reliability are two of the most important web platforms features. You like to design and build processes and platforms that run flawlessly and fast.

      The ideal candidate
      • Has experience with different stacks (mainly Linux based), technologies and production models and has participated actively on the build of important pieces of a cloud platform.
      • We would like to know as much about you as possible. Contact us and tell us about your experience and your motivations for this job and send us any link of something that represents you or your experience.
      Even if you feel you are not 100% exactly the person described, we would still love to hear from you. We value anything that makes you different from the description.
      Even if you're not able to tick all of these boxes, we would still love to hear from you.
    • 1 week ago
      To join our growing team, SugarCRM is currently seeking an experienced Site Reliability Engineer.  This role can be based in one of our U.S.-based offices or remote.

      Impact you will make in the role:
      • Manage applications in a CentOS Linux-based environment
      • Build repeatable infrastructures with Ansible
      • Develop and execute plans for rolling out new technologies rapidly
      • Improve monitoring infrastructure, build out data aggregation and alerting rules
      • Work closely with engineering to build scalable solutions
      • Triage tickets raised by our support organization and implement fixes
      • Support our private and public cloud environments and customers
      • Mentor other members of the Operations team
      • Participate in an on-call rotation

      Expertise you will bring in:
      • BA/BS in Computer Science with Network Engineering or Information Systems emphasis, or equivalent work experience
      • Extensive knowledge with container orchestration technologies including Docker and Kubernetes
      • 6+ years experience in an Operations or Systems Administration role
      • Superior Unix administration skills
      • Extensive knowledge of common Internet Protocols
      • Extensive knowledge of TCP/IP
      • Experience with virtualization and cloud technologies
      • Hardware management, network switch and router administration experience
      • Experience with Apache, MySQL, and PHP in a production environment at scale
      • Strong knowledge of version control systems and hands-on experience with Git
      • Experience with writing code around infrastructure automation
      • Understanding of how to architect and implement highly available, scalable, and secure network in multiple cloud environments
      • Strong affinity and experience in working with continuous deployment and continuous integration environments
      • An understanding around micro-service architectures and the complexities around their deployments 
      • Extensive programming experience in PHP, Ruby, Python, and Shell
      • Full stack troubleshooting and instrumentation experience
      • Extensive experience with IT automation technologies like Puppet, Salt, Chef, or Ansible
      • Experience with data aggregation, alerting, and reporting and supporting technologies such as Sensu and Graphite

      Nice to haves:
      • Experience in an on-call rotation
      • Experience with Elastic Search or Apache Solr
      • Experience with Spinnaker and/or other CI/CD tools
      • Previous experience as a mentor or advisor
      • Current contributor to open source projects (a Github account you can link us to would be ideal)
      We are an Equal Opportunity, Affirmative Action employer. Minorities, women, veterans and individuals with disabilities are encouraged to apply.

      Benefits and Perks:

      Beyond a stellar work environment, friendly people, and inspiring, innovative work, we have some great benefits and perks:
      Competitive salariesExcellent medical, dental and vision coverage for you and your family, along with other benefit plans like 401(k) matchUnlimited Paid Time OffWellness Reimbursement ProgramOnsite Programs, depending on location, such as Dry Cleaning, Car Washes, Massage, Yoga, and moreCareer & Personal Development Program – multi-platformRegular social eventsOwnership is the greatest self-identity at SugarCRM - you are making an impact nowWe are a merit-based company - many opportunities to learn, excel and grow your career
    • Mattermost, one of Y Combinator's top 100 companies, provides an open source enterprise-grade messaging platform to the world’s leading organizations that allows teams to collaborate securely and privately anywhere. With over 10,000 server downloads / month our customers include Uber, Samsung, Affirm, The US Department of Defense and more. Our private cloud solutions offer secure, configurable, highly-scalable messaging across web, phone and PC with archiving, search, and deep integrations with hundreds of SaaS and on-premises technologies. Headquartered in Palo Alto, California, our company serves customers around the world with a distributed organization spanning the globe.

      We value high impact work, ownership, self-awareness and being focused on customer success. If these values match who you are, we hope you'll learn more about working at Mattermost and come talk to us!

      About the Role

      Working in open source means your work is publicly visible. Your code will receive both credit and constructive critique from the community. With the right mindset and support these can lead to you a highly positive working environment and making the best engineering decisions of your career. Core committers include highly skilled volunteer developers from the community, staff employed by enterprises deploying and investing in Mattermost, as well as staff employed by Mattermost, Inc.

      Read about our end-to-end recruiting process for core committers at: https://docs.mattermost.com/process/developer.html

      We are looking for an engineer with demonstrated experience in software development and infrastructure with a focus on ensuring high reliability and scaling of Mattermost’s SaaS offering through building tools, deploying infrastructure and automation.

      Responsibilities
      • Develop tooling and infrastructure to support thousands of customers on Mattermost’s SaaS offering
      • Write thoughtful and high-quality code in Go
      • Define infrastructure in code with Terraform
      • Implement, maintain and tune monitoring and alerting systems
      • Build custom tools and services to automatically recover from incidents
      • Respond on-call to incidents with quick and effective resolutions
      • Deploy applications to and manage Kubernetes clusters
      • Write clear and concise plans for incident response playbooks


      Requirements:
      • Bachelor's degree in Computer Science or related fields, or significant professional software development or DevOps experience
      • Strong demonstrable experience in building and maintaining highly reliable services
      • Strong experience with SRE and DevOps methodologies
      • Experience with or an ability to quickly become proficient in Go
      • Familiarity with containers and orchestration systems, like Docker and Kubernetes
      • Comfortable working with infrastructure as code tools, such as Terraform
      • Ability to be on-call


      Pluses
      • Experience with distributed application systems using HTTP, WebSockets, RPC, pub/sub at scale
      • Practical AWS experience
      • Knowledge of Grafana and Prometheus
      • Comfortable with GitHub, Jira, Jenkins, CircleCI
      • Experience working in open source communities
      We're looking for someone who wants to help us build the future of Mattermost and improve the way the world communicates. The right person in this role has the opportunity to have a huge impact on Mattermost the product, and its many users worldwide, but also on our open source community that has been key to Mattermost's success. If this sounds like you - please apply!
    • ROLE DESCRIPTION

      We are looking to expand our team of Kubernetes Operations engineer. The focus of this role is 3-fold:

      Work directly with clients and customers to assist in consulting, setting up and operating Kubernetes clusters. Generally, the Kubernetes clusters will be based on our own Lokomotive full-stack Kubernetes distribution.

      Work as a part of the engineering team working to improve Lokomotive, Flatcar Container Linux and our other Kubernetes-related projects, applying your operations experience and direct feedback from customers to help guide the projects.

      Be part of the oncall schedule to support our subscription customers.

      We are working to build a follow-the-sun team so that support times during the week are during work hours.

      In essence this role has elements of both a Site Reliability Engineer (SRE) and a Software Engineer.

      The ideal candidate has operations experience with Kubernetes but also beyond. It is a person who has the experience of being oncall, and resolving and helping to mitigate issues in production environments. It’s also a person who can clearly communicate to customers about these issues and communicate with the engineering team about the experiences that matter to customers. This role is the interface between the customer and the product engineering teams.  It is this role’s positioning as the feedback loop between customers and the product that makes it so crucial.

      To support you, you’ll have at your disposal the renowned Kinvolk engineering team that has completed dozens of challenging projects at every layer of the system. You’ll find that your supporting team can get you then answers you need and help you find short and long-term solutions to issues and help you grow as an engineer.

      Responsibilities

      • Work directly with customers to assist in consulting, setting up and operating Kubernetes clusters

      • Interface with clients and advise on best practices for managing Kubernetes cluster

      • Be on call during reasonable hours on a rotating basis (follow-the-sun rotation)

      • Provide first-line support to customers

      • Work to improve our open-source cloud products

      • Be a liaison between customers and product engineering team

      • Participate in product engineering

      • Review and document changes

      • Stay current on the cloud infrastructure technology landscape

      • Work closely with the rest of the Kinvolk team; communicating across projects.

      • Represent Kinvolk at community events

      Multiple openings are available.

      REQUIREMENTS

      This role requires experience in setting up and operating Kubernetes clusters at a senior level. One is expected to be able to interface directly with customers to provide authoritative responses and advice.

      To get the job done, you’re going to need these.

      Required

      • Experience operating Kubernetes in production

      • Deep understanding of how Kubernetes works

      • Ability to listen to customers and distill that input into actionable tasks and recommendations

      • Good knowledge of distributed systems

      • Good knowledge of Linux systems

      • Good networking know-how

      • Experience in scripting languages

      • Ability to interface directly with clients and customers

      • Ability to work independently

      • Good at communicating technical issues and requirements

      • Good written and spoken English

      Desired, not-required

      If these items apply to you, awesome! If not, expect to add these while at Kinvolk.

      • Passed Certified Kubernetes Administrator exam. If not, we will support you in attaining this within 6-months of joining

      • Experience with the Go programming languages

      • Low-level knowledge of container and process isolation technologies

      • Comfortable giving talks at conferences

      • If in Berlin, good written and spoken German is a plus

      WHY KINVOLK?

      • We’re always looking for ways to make Kinvolk a friendly and motivating work environment. Here are some of the things we already offer.

      • You would be working on the cutting edge of technology, with a world-class team from whom you will be able to learn - just as we hope to learn from you!

      • We offer a competitive salary (reviewed annually), with equity participation (virtual share options) for all employees

      • Flexible working hours policy, and generous holiday allowance

      • An open, non-hierarchical, multi-cultural environment, with nearly as many nationalities represented in as we have people

      • And many others like:

      • Work exclusively on Linux technologies

      • Work closely with open-source communities

      • Lunch paid once/twice weekly (Berlin)

      • Assistance with public transport ticket and home Internet bill (Berlin)

      • Company mobile phone plan (Germany)

      • German language classes 2 times weekly, if needed (Berlin)

      • Generous hardware allowance for laptop, monitor, phone and/or tablet of your choice

      • Represent Kinvolk at conferences

      • Free drinks and snacks if you're working out of the office

      • Need a book? We’ll order it for you and add it to our tech bookshelf

      HOW TO APPLY

      Apply using the button below. If you have other questions, please send those to [email protected]

      ABOUT US

      Kinvolk is a rapidly growing tech company building Linux & Kubernetes-based open-source software products, and offering related engineering and technical support services. Our customers are amongst the largest and most influential in the space: Microsoft, SAP, CoreOS, and many more.

      While founded in Berlin, Kinvolk is quicky expanding and has recently opened an India subsidiary.

    • 1 month ago
      About HashiCorp

      At HashiCorp, we operate according to a strong set of company principles, many of which are described in The Tao of HashiCorp. We value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users. We take care to balance and be responsive to the needs of our open source community as well as our enterprise level customers.

      Engineering at HashiCorp is largely a remote team, and this role is no exception. We are looking for a Full-time Remote Employee within the US or Canada. While prior experience working remotely isn't required, we are looking for team members who perform well given a high level of independence and autonomy.

      Our Products

      We build Consul, Nomad, Vault, Terraform, Packer, and Vagrant. Alongside of that, we deploy enterprise products for each in a variety of different ways: licensed and unlicensed binaries, appliances to public cloud platforms, and hosted SaaS platforms. Our products help organizations of all sizes run any infrastructure for any application.

      At HashiCorp, we value top-notch collaboration and communication skills, both among internal teams and in how we interact with our users. We take care to balance and be responsive to the needs of our open source community as well as our enterprise level customers.

      The Cloud Services team is an organization focused on delivering Hashicorp’s software as a Cloud service.  This effort will enable a distribution model wherein customers can use a fully managed service with an API contract.

      In your cover letter, please describe why you're interested in working at HashiCorp, and what draws you to this role in particular!  Specifics of your past experiences that are relevant to this role are great to include, too.

      In this role, you can expect to:
      • Design, implement, and maintain a secure and scalable infrastructure platform for delivering Cloud Services’ applications
      • Own and ensure that internal and external SLA’s meet and exceed expectations, System centric KPIs are continuously monitored and improved
      • Create tools for automating deployment, monitoring and operations of the overall platform
      • Participate in on-call rotation to provide application support, incident management, and troubleshooting
      • Provide ongoing maintenance and support of internal tools, improve system health and reliability
      • Program mostly in Golang, learning from and contributing to a team committed to continually improving their skills
      You may be a good fit if:
      • Familiarity with infrastructure management and operations lifecycle concepts and ecosystem
      • Experience operating and maintaining production systems in a Linux and public cloud environment
      • You have prior experience working in high performance or distributed systems; while we strive to hire at a variety of experience levels, this particular opening is not well-suited for recent graduates
      • Working knowledge of industry best practices with regard to information security
      • You have built or operated a large scale Cloud service
      • Comfortable with Go or another low-level programming language

      HashiCorp embraces diversity and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. We believe the more inclusive we are, the better our company will be.

      #LI-RM1

       

    • POSITION SUMMARY:

      The Site Reliability Engineer is responsible for the health and well-being of the production environment, implementation of new and existing components, and maintaining and modernizing the processes and methods used within our platform. They will be expected to interface with the rest of the operations, development and business teams, lead assigned projects, participate in peer mentoring and operate an always-on production environment.

      ESSENTIAL DUTIES AND RESPONSIBILITIES:

      • Onboard and optimize microservices using Docker

      • Streamline CI/CD process and green/blue deployment

      • Optimize resource usage to meet KPI targets

      • Maintain and evolve monitoring and notification systems

      • Create and maintain documentation on new services, procedures, and requirements

      • Participate in an on-call schedule established by your manager, and be ready and available while on-call to immediately diagnose and resolve incidents.

      • Participate in the diagnosis and resolution of escalated critical emergency incidents.

      QUALIFICATIONS:

      • Bachelor’s degree or equivalent work experience

      • Linux / Unix system administration skills, 5-10 years operations experience

      • Strong time and project management skills and attention to detail

      • Solid experience in the administration and performance tuning of application stacks

      • Experience with multiple cloud hosting providers, and extensive experience with AWS

      • Experience with virtualization and containerization (i.e. docker)

      • Experience with RabbitMQ, ElasticSearch and Redis

      • Experience with monitoring and metrics systems (i.e. nagios, grafana)

      • Experience with configuration management systems (i.e. Ansible, Chef)

      • Solid scripting skills (i.e. shell scripts, Ruby, Python, Go)

      • Authorized to work in the United States and pass standard background checks for compliance standards

    • Summary


      Wikimedia’s Site Reliability Engineering team is principally responsible for ensuring our global top-10 web site, our public facing services and underlying infrastructure are healthy and developing further in support of Wikimedia’s mission. The SRE team comprises over 30 creative and talented staff members that are globally distributed and organized into 6 teams each with their own scope and focus area. We are strengthening the team and looking for several Engineering Managers to help our staff and teams achieve our goals.


      As an Engineering Manager, you will support engineers developing services and infrastructure, deploying and building new features, products, and services used by hundreds of millions of people around the world. This is an opportunity to do good while improving one of the best known sites in the world. 


      Your Responsibilities:


      • Manage one to two globally distributed teams within Site Reliability Engineering

      • Recruit, hire, and help onboard new team members

      • Work with team members to set individual performance goals, and support them in meeting and evolving their goals and career path

      • Triage incoming workload, maintain focus on priorities, and set realistic expectations for both peers and team members

      • Coordinate and communicate with other members of the Wikimedia engineering teams on relevant projects, and contribute to the organizational strategy

      • Continuously develop the roadmap of the team in alignment with other SRE and Technology teams, and help draft and execute the team’s annual and quarterly plans

      • Project manage new and existing initiatives

      • Lead the definition, refinement, and execution of the processes through which the team manages and performs work.

      • Lead incident response, diagnosis, and follow-up on system alerts and outages across Wikimedia’s production infrastructure

      • Facilitate the definition and establishment of Service Level Indicators and Objectives with service owners and stakeholders

      • Share our values and work in accordance with them

      Skills & Experience:


      • Prior experience managing teams

      • Strong technical background, including 5+ years experience as part of an SRE, TechOps or software engineering team

      • Experience working with or applying one or more project management methodologies to site reliability engineering work

      • Aptitude for automation and streamlining of tasks

      • Communicate effectively in both spoken and written English

      • Ability to work independently, as an effective part of a globally distributed team

      • Willing and able to travel several times a year for occasional in-person meetings

      • B.S. or M.S. in Computer Science or the equivalent in related work experience

      Additionally, we would love it if you have:


      • Experience working in a distributed, largely remote environment

      • Experience contributing to open source projects

      Teams


      • Service Operations: Build and improve our new Kubernetes based Deployment pipeline and help our teams, service owners and developers across the organization test and deploy our existing application platform as well as new applications/features.

      • Data Persistence: Store, query and protect the sum of all human knowledge! Work together with our engineers to ensure existing and new data needs are met in an efficient and reliable manner, using the most appropriate boring and exciting open source technologies: MySQL, Cassandra, OpenStack Swift, Ceph.

      • Observability: Work across SRE and Technology to provide teams with tools, platforms, and insights into how systems and services are performing. Leverage exciting technologies such as Prometheus, AlertManager, Grafana, Logstash, Kibana, Kafka and more. Research emerging tools, trends and methodologies and work with the open source community to contribute back that knowledge to the commons.

      The Wikimedia Foundation is... 


      ...the nonprofit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.


      The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply.


      U.S. Benefits & Perks*


      • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)

      • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more

      • The 401(k) retirement plan offers matched contributions at 4% of annual salary

      • Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.

      • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.

      • For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program

      • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses

      • Telecommuting and flexible work schedules available

      • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax

      • Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

      *Eligible international workers' benefits are specific to their location and dependent on their employer of record