Senior Systems Reliability Engineer

Packet Fabric


1 week ago

02/13/2020 10:22:26

Job type: Full-time

First appeared on StackOverflow

Category: Software Development


Description
As a well rounded systems reliability engineer with a diverse set of skills, this makes you one of the very best people to troubleshoot, monitor the platform, and be on top of releases. You should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level! A typical day might include these types of activities:

- Taking charge of the build process and pipelines across the platform.
- Being keenly aware of systems architecture and automatically adding in redundancy and backup for new systems and software.
- Assist in troubleshooting a complex customer issues across network devices, server hardware, virtual machines, in-house software and open source software. Not only can you run tcpdump with filters on the command line, but you can read it there also.
- Adding additional monitoring and alerting on all systems across the platform that will help you identify one of those annoying intermittent issues you have seen in the logs.


Skills & Requirements
The right candidates will probably have a CS degree, solid scripting and automation skills, great troubleshooting skills across the OS and network, a good grasp on security concepts, experience with routing platforms and protocols, and enjoy working collaboratively.

Specific requirements include:

- Experience in automating tasks through scripting. You should be very well versed with Python, and probably a few other languages. We will ask for script samples.
- High degree of drive to improve and automate your environment with minimal guidance
- Be able to solve for immediate, and plan to accommodate for future problems
- Experience with Ansible, Salt, Chef, Puppet, Terraform, or CFEngine. Experience with Ansible and Terraform preferred.
- Experience with build pipelines, integration testing and Jenkins.
- Experience administering a wide variety of *nix platforms, including multiple Linux variants.
- Solid understanding of Layer 2 and Layer 3 protocols including IPv4/6, 802.1Q, BGP, MPLS, etc., and understanding a multitude of different network architectures.
- Experience with Google Compute, AWS, or other cloud based compute and database services.
- Understand the importance and implementation of backup and redundancy across many layers of databases, systems, and network configurations.

Some knowledge that would be a huge plus:

- Familiarity administering/troubleshooting Juniper/Cisco/Arista platforms.
- Experience with extremely large scale network management and monitoring.
- Experience with Postgresql, TimescaleDB, ElasticSearch

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

    • Help build a world class machine learning deployment solution that solves real-industry problems at a massive scale

    • Join a remote-friendly company - work anywhere in the US or Canada including your sofa, the beach, or our Seattle waterfront office

    • Experience rapid growth in an AI startup, backed by industry leaders including Google’s AI fund

    Algorithmia automates, optimizes, and accelerates every step of the journey to deploying of AI and ML at scale. We allow anyone to run models on a massively parallel infrastructure in minutes instead of months. In our cloud or your datacenter - all completely managed for maximum performance at minimum cost. Already trusted by over 90k developers and major enterprise customers, Algorithmia makes scalable Machine Learning fast, simple, and cost-effective for everyone.

    Due to ongoing growth, we’re hiring a Machine Learning - Infrastructure Engineer to join the Machine Learning team. You’ll join a team of highly focused engineers developing for a platform that supports over 90k engineers and processes millions of AI and ML workloads. Our team has worked on building billion dollar products at Amazon, Danger, Microsoft, Socrata, and Paypal. We offer our engineers an unparalleled opportunity to learn, grow, and impact an enormous user community.

    What does the Machine Learning team do?

    The Machine Learning team is empathetic to our users. We build and deploy models, experience the whole Machine Learning lifecycle. We turn that experience into stories, content, demos, and perhaps most importantly, feedback into the product. We can split our responsibilities into three broad categories that contributes to the company's mission:

    Content

    The ML team thinks about the best practices for machine learning systems, and tries to be on the forefront of thought leadership in our space. We produce blogs and technical demos. This isn’t limited just within the Algorithmia platform, but includes full pipelines demonstrating the end-to-end machine learning lifecycle. We keep an eye on newly released open-source models, and add them to the platform. However, this responsibility is secondary to the focus on how to do machine learning in production.

    Product

    The ML team drives value to the company through what it brings to the product. Many features developed by the ML team is being used in customer demos. The team adds & maintains new programming languages & runtime environments for algorithms. We keep up with the developments in all major ML frameworks. We try out common ML workflows, and work with product to ensure the platform can support those integrations.

    Support

    The ML team provides deep technical support at the algorithm/model level.

    As a Machine Learning Engineer at Algorithmia, you will:

    • Write production-quality code that solves real world problems, in any of our supported algorithm development languages

    • Create blog posts, integrations & demos for end-to-end machine learning systems

    • Build & maintain build/runtime environments for all major machine learning frameworks: Tensorflow, PyTorch, MXNet, Caffe, AllenNLP, SpaCy, etc

    • Develop tools to use for Data Scientists from top Fortune 100 companies around the world

    • Work with a passionate, distributed team on the cutting edge of AI/ML infrastructure

    • Have a real career plan, with mentorship and fast-track opportunities to promotion, technical leadership, people management, or wherever your interests may be

    • Work anywhere in the US or Canada

    And we might make the perfect match if you:

    • Are a skilled software engineer with experience in more than one programming language  (such as Python, Java, Scala, etc.) and deep understanding of at least one (we do a lot of Python - and will be happy to teach you the other languages)

    • Have deep empathy for users, and understand that Algorithmia would not exist without them

    • Experience working on distributed systems, industry data science, any kind of public AI/ML projects, distributed or parallel computing, or the implementation of something cool on our AI marketplace (hint: free trial!)

    • Are current on the state-of-the-art in machine learning algorithms in the industry

    • Having practical experience or a degree (MS/PhD is a plus) in Computer Science including practical areas of Machine Intelligence (or Deep Learning), and excellent fundamentals in computer science, algorithms, and software design

    As a Machine Learning - Infrastructure Engineer at Algorithmia you’ll join a passionate team that’s changing the way everyone uses AI and ML. You’ll solve real problems, make an impact, and work in a flexible environment that encourages you to follow your own interests as well. You’ll be welcomed into an intelligent, quirky, and diverse group and gain access to fantastic perks beyond just salary, equity, and insurance benefits - all from the comfort of your own sofa (or our dog-friendly office). 

    Algorithmia is an equal opportunity employer and we value diversity at our core. We will never discriminate on the basis of race, religion, color, national origin, gender, sexual orientation, age, marital status, veteran status, or disability status and encourage everyone to apply.

  • Quickly maturing startup seeking like minded Sr. Backend-leaning Fullstack Engineer! PacketFabric redefines how companies procure, consume, and manage their network connectivity. The technical team is a small, talented, and close knit group and we need a team member to help us accomplish our goal of making the best darn carrier network on the planet.

    Description

    • As a well rounded software engineer, you should definitely be the type that appreciates diversity in your day, and challenges outside of your comfort level! A typical day in the life of a PacketFabric senior software engineer might include these types of activities:
    • Designing a deterministic lifecycle workflow for our next product offering.
    • Writing core platform code for a new feature, and unit tests for functionality.
    • Refactoring and improving existing code for performance and simplicity. For example, breaking a large method into smaller, more maintainable and easily tested methods.
    • Building command line tools to help network engineers better manage network state.
    • Researching additional ideas, you may have to improve the product/platform overall and sharing with the team.
    • Interacting with customers and/or sales on a bug in the software, quickly resolving it, and coordinating across the team to push a fix.
    • Working with backend engineers and discussing quirks in network protocols and network interconnection which translate rapid API and UI changes.

    Requirements

    • The right candidate will have an abundance of hard core programming skills, have solid instincts for API usability and design patterns. You are probably a full stack developer who naturally gravitates towards work on a product core. You know how to sacrifice algorithmic elegance for getting it done on deadline. More specifics include:
    • Extensive experience with Python and PHP in large applications developed in a team environment.
    • Expert unit tester.
    • Experience in large scale distributed systems.
    • Extensive experience with the HTTP protocol and developing and using RESTful APIs.
    • A solid understanding of OO programming paradigms.
    • Experience with a message queue system like RabbitMQ or Kafka.
    • Experience using NoSQL data stores like Redis.
    • Be completely at home on any *nix command line and building your own tools.
    • Very comfortable using Git in a team environment (i.e. pull-requests, branch management, rebasing).
    • Experience working in an environment leveraging remote communication collaboration tools like Slack, Zoom etc.
    • Never being afraid to venture boldly where none have gone before and develop code where there are no previous libraries to draw from.

    Preferred Experience

    • A huge plus for actually doing any router/switch configuration or infrastructure automation.
    • Previous exposure to layer 2/3 networking protocols and concepts such as IPv4/6, VLANs, VPNs, BGP, etc.
    • Experience with Python-based web application frameworks like Flask, Django, or Sanic
    • Any experience interacting with physical world equipment - industrial, medical, etc
    • Experience creating highly maintainable Javascript.
    • SQL experience.
    • Experience with Vue.js, Angular and AngularJS.
    • GraphQL
    • Experience creating large scale data visualizations of any type.
  • Bevy Labs (North or South America)
    5 days ago

    At Bevy Labs we have deep experience building community from the ground up. We are building the best products to help companies manage and scale their user groups and event communities all over the world. We are a distributed company and strive to be as diverse as the people using our products.

    Bevy Labs Engineering

    On the engineering team you are at the heart of the action, contributing to products that are actively used by world-class communities to create connections and experiences for their people.

    We care about solving challenging problems to build products that make a real difference in the lives of our customers and their users. We also care about the craft of software engineering and how we can always become better at what we do, individually as well as collectively.

    This means continuous integration, lots of automated test coverage, thorough reviews, good thinking and lots of experiments to discover new ways of improvement.

    We are still small and nimble, but we are excited to grow.

    This position

    Over time an area of specialization may emerge, but for the foreseeable future this is a position that will likely get in touch with many different areas of the product.

    You

    You will fit in well with us, if you:

    • Reside in North or South America. Yes, we are a distributed company, but since we are still small, we like to minimize the time zone spread within the team.

    • Are an excellent communicator. In our small team, English is the official language. You need to be able to articulate complex ideas efficiently and effectively. When people do not share an office, it is essential to pay extra attention to communication.

    • Have a solid technical background. You should have at least 5 years of professional software development experience and be able to point to a track record of caring about software engineering practices.

    • Feel at home with Python/Django, JavaScript/React.js and the shell command line. You have been working in current cloud-based environments (such as AWS or GCP), but you don’t feel tied one platform and generally appreciate picking the “right tool for the job.”

    • Like to learn and strive to do so often. As a company we improve to the extent that our team does. It starts with each individual. Humility and an open mind help a lot.

    • Ideally know what it is like to work in distributed development teams, or better yet, thrive in them. It probably means you already know you don’t need a structured office environment with a manager who checks in on you once a day. Likewise, you know that you will do best from your home office.

    We are proud to foster a workplace free from discrimination. We strongly believe that diversity of experience, perspectives, and background will lead to a better environment for our employees and a better product for our users and the communities we serve.

    Principals only please.

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!