Senior/ Lead Site Reliability Engineer
1 week ago
Hiring from: USA Only
Category: DevOps / Sysadmin
ScienceLogic is going through a product transformation and the Cloud Operations team is at the forefront of it. We are responsible for the design, deployment, and maintenance of the Cloud Infrastructure used for running company’s revenue generating go-forward SaaS product line.
ScienceLogic’s current SaaS product is a single tenancy, highly available and secure platform used by many customers for achieving their AIOps objectives. Cloud Operations leads the SaaS portfolio from the front by onboarding new customers on their own dedicated instance of the product, performing capacity planning, platform maintenance, upgrades, security and triaging incident response for the SaaS platform.
Overall, we’re passionate about automation and solving complex business and technology challenges. Our team combines SRE, DevOps, Software Development and Information Security knowledge to help make Cloud operations agile, elastic inside the security and governance framework boundaries. If you are well versed in cloud technologies, have an automation mindset and are ardent follower of the SRE discipline…then our team will be benefited by your skillset!
What you’ll be doing…
· Be a key contributor on an Agile development team, collaboratively realizing business value through iterative software development lifecycle
· Design, automate, test, and monitor the use of cloud native technologies as a foundation for a service platform
· Investigate and resolve customer and operational issues with the mentality of fixing and not just mitigating issues
· Automate the practice of keeping third party and open source cloud native technologies up to date, secure, and performant.
· Employ advanced monitoring practices and technologies to detect and automatically resolve platform issues before they impact the customer’s experience.
· Participate in architecture and operations reviews
· Identify and automate measurement of operations SLAs and SLOs
· Triage incident response, document SOPs, Runbooks and train NOC team members
· Writing automation that can be easily supported and extended by others
· Take full responsibility for the availability and performance of the platform
Qualities you possess…
Here at Cloud Operations, we believe that if you are hungry for learning, passionate for technology and like building tools then you are a good fit. Having experience with below skills is an added plus:
· 5+ years of software development or site reliability engineering or equivalent experience
· Skilled at problem solving, algorithms, and data structures
· Building tools and scripting frameworks from scratch
· Working with Cloud Automation tools like CloudFormation, Terraform, CDK, aws-cli
· Scripting languages like Python, Groovy, PowerShell, Bash, Perl etc.
· Configuration automation using Ansible or equivalent tools
· Exposure to Windows, Linux administration skills
· Project management tools like Jira, Trello
· Prior experience in dealing with Datastore technologies like Postgres, MySQL, SQL, DynamoDB is desirable
· Familiarity with basic networking, security and cloud engineering concepts
· Team player who is eager to help others to succeed through mentoring and leading by example
· Highly collaborative with effective written and verbal communication skills
ScienceLogic is a leader in IT Operations Management, providing modern IT operations with actionable insights to resolve and predict problems faster in a digital, ephemeral world. Its solution sees everything across cloud and distributed architectures, contextualizes data through relationship mapping, and acts on this insight through integration and automation. www.sciencelogic.com
Before you apply, please check if any restrictions apply in terms of time zone or country.
This job has a geo-restriction in place: USA Only.
Please mention that you come from Remotive when applying for this job.
Does this job need an edit? 🙈