Principal Site Reliability Engineer
1 month ago
Job type: Full-time
Hiring from: USA Only
Category: Software Development
What if you could use your technology skills to develop a product that impacts the way communities’ hospitals, homes, sports stadiums, and schools across the world are built? Construction impacts the lives of nearly everyone in the world, and yet it’s also one of the world’s least digitized industries, not to mention one of the most dangerous. That’s why we’re looking for a talented Principal Site Reliability Engineer to join Procore on our journey to revolutionize a historically underserved industry.
As a Principal Site Reliability Engineer, you’re given the unique opportunity to drive the next generation of application platform initiatives in a global SaaS infrastructure. You’ll work side-by-side with product development teams and SREs to automate and rollout new standardized service platforms for product code. Backed by the might of our teams, we’ll provide you with the tools and resources needed to achieve extraordinary results that render a significant impact extending beyond the boundaries of traditional engineering roles.
This position will report to our Director, Site Reliability Engineering, and has the opportunity to be located in our Carpinteria, CA, headquarters or Austin, TX office. Remote candidates will be considered with the expectation of occasional travel to these offices. We’re looking for someone to join our team immediately.
What you’ll do:
- Drive deployment excellence and product quality through a software-defined approach to operations and infrastructure
- Identify opportunities for differentiating open-source initiatives, and lead the development of new open-source SRE/Infrastructure platform tools (e.g. Envoy, Helm)
- Serve as a champion for idempotent infrastructure-as-code by taking ownership in the end-to-end configuration, technical dependencies, and overall success of the SaaS environment
- Ensure services are designed and delivered to be mission critical with focus on security, resiliency, scale, and performance
- Educate and drive global adoption of automation and orchestration principles, and create an eagerness to automate, wherever and whenever the possibility arises
- Lead reviews of site reliability processes such as testing, CI/CD, and release management. Provide unwavering support and collaboration for the software/QA engineers on projects
- Ensure new and existing products support automated deployment and remote execution-based remediation scripts
- Lead the improvement of testing functionality, operability, deployment, and performance for application or infrastructure changes
- Mentor and coach junior site reliability engineers, and be a driver for change and DevOps adoption across the broader organization
What we're looking for:
- BS or MS degree in Management Information Systems or a related discipline; Technical Certifications are a plus
- 10+ years of combined experience as a Software Engineer and DevOps Engineer
- 5+ years supporting production in a SaaS multi-tenant environment
- Demonstrated experience leading automaton infrastructure/application systems deployment and configuration
- Expert with AWS services (certified SysOps Administrator or Solutions Architect preferred)
- Experience leading large initiatives with the ability to course-correct as needed
- Experience working with teams, providing mentorship and guidance to improve the overall reliability of the ecosystem
- Ability to consistently evaluate current technical approaches to continue to be industry best-of-class
- Substantial experience in many of the following technologies, with demonstrated expertise and the capacity to lead complex technical initiatives in at least one of the related domains:
- Infrastructure/cloud automation tooling (e.g. CloudFormation, Terraform, Packer)
- Service Mesh/Discovery Tooling (e.g. Consul, Envoy, Istio, etcd)
- Continuous Integration (e.g. Spinnaker)
- Containers and Container Management (Docker, Kubernetes)
- Configuration and Security Management (e.g. Puppet, Chef, Ansible, Salt, Vault, KMS)
- Networking protocol knowledge (e.g., TCP/IP, UDP, IPSEC, HTTP, HTTPS, routing protocols)
- Demonstrated experience leading/contributing significantly to an open-source infrastructure/application platform initiative a big plus (e.g. Kubernetes/Istio/etc. upstream commits)
Procore Technologies is building the software that builds the world. We provide cloud-based construction management software that helps clients more efficiently build skyscrapers, hospitals, retail centers, airports, housing complexes and more. At Procore, we have worked hard to create and maintain a culture where you can own your work and are encouraged and given resources to try new ideas. Check us out on Glassdoor to see what others are saying about working at Procore.
We are an equal opportunity employer and welcome builders of all backgrounds. We thrive in a diverse, dynamic and inclusive environment. We do not tolerate discrimination against employees on the basis of age, color, disability, gender, gender identity or expression, marital status, national origin, political affiliation, race, religion, sexual orientation, veteran status, or any other classification protected by law.
Perks & Benefits
You are a person with dreams, goals, and ambitions—both personally and professionally. That's why we believe in providing benefits that not only match our Procore values (Openness, Optimism, and Ownership) but enhance the lives of our team members. Here are just a few of our benefit offerings: competitive health care plans, unlimited paid vacation, stock options, employee enrichment and development programs, and friends & family events.
Please mention that you come from Remotive when applying for this job.
Help us maintain Remotive! If this link is broken, please just click to report dead link!