Python Data Engineer

Parse.ly


2 months ago

11/02/2019 10:20:44

Job type: Full-time

Category: All others


Parse.ly is a real-time content measurement layer for the entire web.

Our analytics platform helps digital storytellers at some of the web's best sites, such as Arstechnica, The New Yorker, The Wall Street Journal, TechCrunch, The Intercept, Mashable, and many more. In total, our analytics system handles over 65 billion monthly events from over 1 billion monthly unique visitors.

Our entire stack is in Python and JavaScript, and our team has innovated in areas related to real-time analytics, building some of the best open source tools for working with modern stream processing technologies.

On the open source front, we maintain streamparse, the most widely used Python binding for the Apache Storm streaming data system. We also maintain pykafka, the most performant and Pythonic binding for Apache Kafka.

Our colleagues are talented: our UX/design team has also built one of the best-looking dashboards on the planet, using AngularJS and D3.js, and our infrastructure engineers have built a scalable, devops-friendly cloud environment.

As a Python Data Engineer, you will help us expand our reach into the area of petabyte-scale data analysis -- while ensuring consistent uptime, provable reliability, and top-rated performance of our backend streaming data systems.

We’re the kind of team that does “whatever it takes” to get a project done.

Parse.ly’s data engineering team already makes use of modern technologies like Python, Storm, Spark, Kafka, and Elasticsearch to analyze large datasets. As a Python Data Engineer at Parse.ly, you will be expected to master these technologies, while also being able to write code against them in Python, and debug issues down to the native C code and native JVM code layers, as necessary.

This team owns a real-time analytics infrastructure that processes over 2 million pageviews per minute from over 2,000 high-traffic sites. It operates a fleet of cloud servers that include thousands of cores of live data processing. We have written publicly about mage, our time series analytics engine. This will give you an idea about the kinds of systems we work on.

What you'll do

For this role, you should already be a proficient Python programmer who wants to work with data at scale.

In the role, you’ll...

  • Write Python code using the best practices. See The Elements of Python Style, written by our CTO, for an example of our approach to code readability and design.

  • Analyze data at massive scale. You need to be comfortable with the idea of your code running across 3,000 Python cores, thanks to process-level parallelization.

  • Brainstorm new product ideas and directions with team and customers. You need to be a good communicator, especially in written form.

  • Master cloud technologies and systems. You should love UNIX and be able to reason about distributed systems.

Benefits

  • Our distributed team is best-in-class and we happily skip commutes by working out of our ergonomic home offices. Here's a photograph of our CTO's setup running two full-screen Parse.ly dashboards.

  • Work from home or anywhere else in our industry-leading distributed team.

  • Earn a competitive salary and benefits (health/dental/401k).

  • Splurge with a generous equipment budget.

  • Work with one of the brightest teams in tech.

  • Speak at and attend conferences like PyData on Parse.ly's dime.

Parse.ly Tech

  • Python for both backend and frontend -- 2.7, some systems in 3.x, and we're going full-on 3.x soon.

  • Amazon Web Services used for most systems.

  • Modern databases like Cassandra, ElasticSearch, Redis, and Postgres.

  • Frameworks like Django, Tornado and the PyData stack (e.g. Pandas).

  • Running Kafka, Storm, Spark in production atop massive data sets.

  • Easy system management with Fabric and Chef.

Fully distributed team

  • Parse.ly is a fully distributed team, with engineers working from across the world. People with past experience working remotely will be prioritized. US/Eastern timezones will be prioritized.

Apply

  • Send a cover letter, CV/resume, and optionally links to projects or code, to [email protected] Make sure to indicate you are applying for the "Python Data Engineer" role.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • Students enroll in Thinkful courses to gain the valuable technical and professional skills needed to take them from curious learners to employed software developers. As an Immersive Course Instructor, you will deliver high quality live workshop content based on existing curriculum, preparing students to successfully transition careers. 

    In addition to working directly with students, Instructors are expected to maintain an environment of regular, candid feedback with the Educator Experience team, and to stay on top of important updates via meetings, email, and Slack. Ideal candidates for this team are highly coachable, display genuine student advocacy, and are comfortable working in a complex, rapidly changing environment. 

    Responsibilities:

    -Delivers high quality workshops based on the curriculum materials, and provides live coding demos when appropriate, to supplement written materials and content to provide students with the skills and knowledge to get their first developer job 

    -Maintains and updates the daily and weekly student syllabus which outlines the required homework and assignments, and deadlines for assessments and projects 

    -Provides up to 2 hours daily of on-demand video and chat support for students as they move through the program assignments 

    -Spends up to 4 hours a day prepping for workshops and updating course materials

    -Works with the other Format Leads for engagement formats (Mentor Sessions, Group Sessions, Grading, Technical Coaching, Mock Interviews/Assessments) to ensure that consistent experience is happening for students in immersive courses 

    -Provide constructive feedback to the Instructional Design team on improvements to the course materials and curriculum based on student experience with the materials 

    Requirements:

    -Strong expertise with Excel, Tableau, PowerPoint, SQL, and Python, and comfort explaining these topics.

    -Ability to explain complicated topics clearly and without jargon

    -Strong written and verbal communication skills

    -High level of detail orientation and an exceptional work ethic

    -Enjoy working with people, not just putting your head down and working

    -Must have a reliable, high-speed Internet connection

    -Minimum 3-4 years of professional data analytics experience

    -Teaching Experience, especially in a remote or online class, is a plus

    -Must be eligible to work in the United States 

    Compensation and Benefit

    -Ability to work remotely with partially flexible hours 

    -Access to all available course curriculum for personal use

    -Membership to a global community of over 500 Software Engineers, Developers, and Data Scientists who, like you, want to keep their skills sharp and help learners break into the industry

    At this time, we are unable to consider applicants from the following states: Alaska, Delaware, Idaho, New Mexico, North Dakota, South Carolina, South Dakota, West Virginia, and Wyoming

  • Auth0 (US or Argentina)
    1 month ago
    Auth0 is a pre-IPO unicorn. We are growing rapidly and looking for exceptional new team members to add to our teams and will help take us to the next level. One team, one score. 

    We never compromise on identity. You should never compromise yours either. We want you to bring your whole self to Auth0. If you’re passionate, practice radical transparency to build trust and respect, and thrive when you’re collaborating, experimenting and learning – this may be your ideal work environment.  We are looking for team members that want to help us build upon what we have accomplished so far and make it better every day.  N+1 > N.

    The Data engineer will help build, scale and maintain the enterprise data warehouse. The ideal candidate will have a deep understanding of technical and functional designs for Databases, Data Warehousing and Reporting areas. The candidate should feed on challenges and love to be hands on with recent technologies.

    This job plays a key role in data infrastructure, analytics projects, and systems design and development. You should be passionate for continuous learning, experimenting, applying and contributing towards cutting edge open source Data technologies and software paradigms.

    Responsibilities:

    • Contributing at a senior-level to the data warehouse design and data preparation by implementing a solid, robust, extensible design that supports key business flows.
    • Performing all of the necessary data transformations to populate data into a warehouse table structure that is optimized for reporting.
    • Establishing efficient design and programming patterns for engineers as well as for non-technical peoples.
    • Designing, integrating and documenting technical components for seamless data extraction and analysis.
    • Ensuring best practices that can be adopted in our data systems and share across teams.
    • Contributing to innovations and data insights that fuel Auth0’s mission.
    • Working in a team environment, interact with multiple groups on a daily basis (very strong communication skills).

    Skills and Abilities:

    • + BA/BS in Computer Science, related technical field or equivalent practical experience.
    • At least 4 years of relevant work experience
    • Ability to write, analyze, and debug SQL queries.
    • Exceptional Problem solving and analytical skills.
    • Experience with Data Warehouse design, ETL (Extraction, Transformation & Load), architecting efficient software designs for DW platform.
    • Knowledge of database modeling and design in a Data Warehousing context
    • Strong familiarity with data warehouse best practices.
    • Proficiency in Python and/or R.


    Preferred Locations:

    • #AR; #US;
    Auth0’s mission is to help developers innovate faster. Every company is becoming a software company and developers are at the center of this shift. They need better tools and building blocks so they can stay focused on innovating. One of these building blocks is identity: authentication and authorization. That’s what we do. Our platform handles 2.5B logins per month for thousands of customers around the world. From indie makers to Fortune 500 companies, we can handle any use case.

    We like to think that we are helping make the internet safer.  We have raised $210M to date and are growing quickly. Our team is spread across more than 35 countries and we are proud to continually be recognized as a great place to work. Culture is critical to us, and we are transparent about our vision and principles. 

    Join us on this journey to make developers more productive while making the internet safer!
  • 1 month ago

    Summary 

    Wikipedia is where the world turns to understand almost any topic — The Wikimedia Foundation is the nonprofit that operates Wikipedia with a small staff.  We are looking for a great data architect who wants to modernize the infrastructure underlying Wikipedia with distributed storage, services and REST interfaces.  If this excites you, we welcome you to join us.

    Description

    • Collaborate with Product Owners, Engineers and stakeholders on product discovery and improvements of our existing systems
    • Design and implement effective data storage solutions and models
    • Articulate the flow of data across our diverse range of systems
    • Ensure reusable clear service design and  documentation
    • Defining and aligning the forms and sources of data to facilitate WMF initiatives
    • Ensure monitoring system performance and identify, define and implement internal process improvements and SLOs
    • Work with Site Reliability and Operations Engineers to analyse and determine service discoverability, capacity plans and high availability
    • Recommend solutions to improve new and existing data storage and delivery systems
    • Change the world for more than half a billion people every month ;) 

    Skills and Experience

    • 3+ years experience in a Data Architect role as part of a team
    • You have a track record of leading data architecture initiatives to completion
    • You have experience analysing, reasoning about, optimising and implementing complex data systems
    • You have expertise in data handling approaches and technologies with good understanding of system development lifecycles and modern data architectures(Data Lakes, Data Warehouse)
    • You are comfortable modeling complex systems using approaches such as Domain Driven Design, eventual consistency, stream processing
    • You have experience with a diverse set of data storage and persistence frameworks and have a strong understanding of core data modelling concepts:
      • Relational & distributed databases (e.g. MySQL, Cassandra, Neo4j, Riak, HBase, DynamoDB, Elasticsearch)
      • Consistency trade-offs and transactional algorithms in distributed systems
      • Principles of fault tolerance and robustness
    • Use the best available tools & languages for each task. Currently we work a lot with Node.js but also use other tools and languages like Go, Python, Java, C, C++ and PHP where it makes sense. 
    • You have experience working with data streaming and pipelining systems(Hadoop, Kafka, Druid)
    • You have experience working with an engineering team, and communicate effectively with other stakeholders.
    • You have a track record of combining a solid long-term architectural strategy with short-term progress.
    • With freedom comes responsibility. You direct your own work and are pro-active in asking for input.
    • You have a scientific mindset and empirically test your hypotheses.
    • BS, MS, or PhD in Computer Science or equivalent work experience

    Pluses

    • Experience working with microservice architectures
    • Experience with open source technology and free culture, and have contributed to open source projects
    • Experience working remotely
    • You know what it means to be a volunteer or to coordinate the work of volunteers
    • Big ups if you are a contributor to Wikipedia
    • Please provide us with information you feel would be useful to us in gaining a better understanding of your technical background and accomplishments

    Show us your stuff! If you have any existing open source software that you've developed (these could be your own software or patches to other packages), please share the URLs for the source. Links to GitHub, etc. are exceptionally useful. 

    The Wikimedia Foundation is... 

    ...the nonprofit organization that hosts and operates Wikipedia and the other Wikimedia free knowledge projects. Our vision is a world in which every single human can freely share in the sum of all knowledge. We believe that everyone has the potential to contribute something to our shared knowledge, and that everyone should be able to access that knowledge, free of interference. We host the Wikimedia projects, build software experiences for reading, contributing, and sharing Wikimedia content, support the volunteer communities and partners who make Wikimedia possible, and advocate for policies that enable Wikimedia and free knowledge to thrive. The Wikimedia Foundation is a charitable, not-for-profit organization that relies on donations. We receive financial support from millions of individuals around the world, with an average donation of about $15. We also receive donations through institutional grants and gifts. The Wikimedia Foundation is a United States 501(c)(3) tax-exempt organization with offices in San Francisco, California, USA.

    The Wikimedia Foundation is an equal opportunity employer, and we encourage people with a diverse range of backgrounds to apply.

    U.S. Benefits & Perks*

    • Fully paid medical, dental and vision coverage for employees and their eligible families (yes, fully paid premiums!)
    • The Wellness Program provides reimbursement for mind, body and soul activities such as fitness memberships, baby sitting, continuing education and much more
    • The 401(k) retirement plan offers matched contributions at 4% of annual salary
    • Flexible and generous time off - vacation, sick and volunteer days, plus 19 paid holidays - including the last week of the year.
    • Family friendly! 100% paid new parent leave for seven weeks plus an additional five weeks for pregnancy, flexible options to phase back in after leave, fully equipped lactation room.
    • For those emergency moments - long and short term disability, life insurance (2x salary) and an employee assistance program
    • Pre-tax savings plans for health care, child care, elder care, public transportation and parking expenses
    • Telecommuting and flexible work schedules available
    • Appropriate fuel for thinking and coding (aka, a pantry full of treats) and monthly massages to help staff relax
    • Great colleagues - diverse staff and contractors speaking dozens of languages from around the world, fantastic intellectual discourse, mission-driven and intensely passionate people

    *Eligible international workers' benefits are specific to their location and dependent on their employer of record

    More information

    Wikimedia Foundation
    Blog
    Wikimedia 2030
    Wikimedia Medium Term Plan
    Diversity and inclusion information for Wikimedia workers, by the numbers
    Wikimania 2019
    Annual Report - 2017 

    This is Wikimedia Foundation 
    Facts Matter
    Our Projects
    Fundraising Report

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!