SRE Hadoop Infrastructure Specialist
Pythian
1 month ago
Job type: Full-time
Hiring from: Americas Only
Category: DevOps / Sysadmin
- Deploy, operate, maintain, secure and administer solutions that contribute to the operational efficiency, availability, performance and visibility of our customers’ infrastructure and Hadoop platform services, across multiple vendors (i.e. Cloudera, Hortonworks, MapR).
- Gather information and provide performance and root cause analytics and remediation planning for faults, errors, configuration warnings and bottlenecks within our customers’ infrastructure, applications and Hadoop ecosystems.
- Deliver well-constructed, explanatory technical documentation for architectures that we develop, and plan service integration, deployment automation and configuration management to business requirements within the infrastructure and Hadoop ecosystem.
- Understand distributed Java container applications, their tuning, monitoring and management; such as logging configuration, garbage collection and heap size tuning, JMX metric collection and general parameter-based Java tuning.
- Observe and provide feedback on the current state of the client’s infrastructure, and identify opportunities to improve resiliency, reduce the occurrence of incidents and automate repetitive administrative and operational tasks.
- Contribute heavily to the development of deployment automation artifacts, such as images, recipes, playbooks, templates, configuration scripts and other open source tooling.
- Be conversant about cloud architecture, service integrations, and operational visibility on common cloud (AWS, Azure, Google) platforms. Understanding of ecosystem deployment options and how to automate them via API calls is a huge asset.
- Understand the end-to-end operations of complex Hadoop-based ecosystems and handle / configure core technologies such as HDFS, MapReduce, YARN, HBase, ZooKeeper and Kafka.
- Understand the dependencies and interactions between these core components, alternative configurations (i.e. MRv2 vs Spark, scheduling in YARN), availability characteristics and service recovery scenarios.
- Identify workflow and job pipeline characteristics and tune the ecosystem to support high performance and scalability, from the infrastructure platform through to the application layers in the ecosystem.
- Understand and enable metric collection at all layers of a complex infrastructure, ensuring good visibility for engineering and troubleshooting tasks, and ensure end to end monitoring of critical ecosystem components and workflows.
- Understand the Hadoop toolset, how to manage and copy data between and within a Hadoop cluster, integrate with other ecosystems (for instance, cloud storage), configure replication and plan backups and resiliency strategies for data on the cluster.
- Comprehensive systems hardware and network troubleshooting experience in physical, virtual and cloud platform environments, including the operation and administration of virtual and cloud infrastructure provider frameworks. Experience with at least one virtualization and one cloud provider (for instance, VMWare, AWS) is required.
- Experience with the design, development and deployment of at least one major configuration management framework (i.e. Puppet, Ansible, Chef) and one major infrastructure automation framework (i.e. Terraform, Spinnaker, CloudFormation). Knowledge of DevOps tools, processes, and culture (i.e. Git, continuous integration, test-driven development, Scrum).
- Ability to pick up new technologies and ecosystem components quickly, and establish their relevance, architecture and integration with existing systems.
- Competitive total rewards package
- Flexible work environment: Why commute? Work remotely from your home, there’s no daily travel requirement to the office!
- Outstanding people: Collaborate with the industry’s top minds.
- Substantial training allowance: Hone your skills or learn new ones; participate in professional development days, attend conferences, become certified, whatever you like!
- Amazing time off: Start with a minimum 3 weeks paid time off, 7 sick days, and 2 professional development days!
- Office Allowance: Purchase a device of your choosing and personalise your work environment!
- Fun, fun, fun: Blog during work hours; take a day off andvolunteer for your favorite charity.
Before you apply, please check if any restrictions apply in terms of time zone or country.
This job has a geo-restriction in place: Americas Only.
Please mention that you come from Remotive when applying for this job.
Does this job need an edit? 🙈