Python Data Engineer

Parse.ly


3 months ago

11/02/2019 10:20:44

Job type: Full-time

Category: All others


Parse.ly is a real-time content measurement layer for the entire web.

Our analytics platform helps digital storytellers at some of the web's best sites, such as Arstechnica, The New Yorker, The Wall Street Journal, TechCrunch, The Intercept, Mashable, and many more. In total, our analytics system handles over 65 billion monthly events from over 1 billion monthly unique visitors.

Our entire stack is in Python and JavaScript, and our team has innovated in areas related to real-time analytics, building some of the best open source tools for working with modern stream processing technologies.

On the open source front, we maintain streamparse, the most widely used Python binding for the Apache Storm streaming data system. We also maintain pykafka, the most performant and Pythonic binding for Apache Kafka.

Our colleagues are talented: our UX/design team has also built one of the best-looking dashboards on the planet, using AngularJS and D3.js, and our infrastructure engineers have built a scalable, devops-friendly cloud environment.

As a Python Data Engineer, you will help us expand our reach into the area of petabyte-scale data analysis -- while ensuring consistent uptime, provable reliability, and top-rated performance of our backend streaming data systems.

We’re the kind of team that does “whatever it takes” to get a project done.

Parse.ly’s data engineering team already makes use of modern technologies like Python, Storm, Spark, Kafka, and Elasticsearch to analyze large datasets. As a Python Data Engineer at Parse.ly, you will be expected to master these technologies, while also being able to write code against them in Python, and debug issues down to the native C code and native JVM code layers, as necessary.

This team owns a real-time analytics infrastructure that processes over 2 million pageviews per minute from over 2,000 high-traffic sites. It operates a fleet of cloud servers that include thousands of cores of live data processing. We have written publicly about mage, our time series analytics engine. This will give you an idea about the kinds of systems we work on.

What you'll do

For this role, you should already be a proficient Python programmer who wants to work with data at scale.

In the role, you’ll...

  • Write Python code using the best practices. See The Elements of Python Style, written by our CTO, for an example of our approach to code readability and design.

  • Analyze data at massive scale. You need to be comfortable with the idea of your code running across 3,000 Python cores, thanks to process-level parallelization.

  • Brainstorm new product ideas and directions with team and customers. You need to be a good communicator, especially in written form.

  • Master cloud technologies and systems. You should love UNIX and be able to reason about distributed systems.

Benefits

  • Our distributed team is best-in-class and we happily skip commutes by working out of our ergonomic home offices. Here's a photograph of our CTO's setup running two full-screen Parse.ly dashboards.

  • Work from home or anywhere else in our industry-leading distributed team.

  • Earn a competitive salary and benefits (health/dental/401k).

  • Splurge with a generous equipment budget.

  • Work with one of the brightest teams in tech.

  • Speak at and attend conferences like PyData on Parse.ly's dime.

Parse.ly Tech

  • Python for both backend and frontend -- 2.7, some systems in 3.x, and we're going full-on 3.x soon.

  • Amazon Web Services used for most systems.

  • Modern databases like Cassandra, ElasticSearch, Redis, and Postgres.

  • Frameworks like Django, Tornado and the PyData stack (e.g. Pandas).

  • Running Kafka, Storm, Spark in production atop massive data sets.

  • Easy system management with Fabric and Chef.

Fully distributed team

  • Parse.ly is a fully distributed team, with engineers working from across the world. People with past experience working remotely will be prioritized. US/Eastern timezones will be prioritized.

Apply

  • Send a cover letter, CV/resume, and optionally links to projects or code, to [email protected] Make sure to indicate you are applying for the "Python Data Engineer" role.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • Songspace (CT +/- 2 hours)
    1 week ago

    Songspace is seeking a Data Developer to join our team that is responsible for developing and managing the data tools, applications, models, processes, and workflows that support the efficient and automated management and provision of the company’s data across internal systems and customer products, the ingestion of multimedia files and metadata provided by our clients and 3rd parties, and the exchange of music data with our sister companies and business partners. This role will also analyze and validate data, cleanse and transform data, and create/run SQL queries to deliver information to various teams across the company.This position offers an opportunity to design, build, and manage mission critical data applications and products with an experienced and supportive team in an interesting and fun industry.

    Job Responsibilities

    • Work with data, technology, and operations teams to design, build, test, and implement an automated data ingestion, integration, and exchange platform, and supporting enterprise data model

    • Design, create, and improve software to monitor, audit, and test data processes, systems, and applications

    • Process the ingestion of music metadata and multimedia assets provided by music creator, publisher,and record label customers and partners within the timelines and specifications defined by the Client Services Team

    • Validate, transform, and cleanse data as required

    • Monitor content delivery processes and queues and troubleshoot/resolve issues and errors

    • Point of contact for technical data issues for the internal team

    • Proactively investigate, troubleshoot, and solve technical data issues, escalating unsolved issues to the appropriate team

    • Create and sponsor concepts and work with internal teams to develop new software and tools that will assist clients and/or internal stakeholders to automate and optimize data workflow and data accessibility

    • Work with the data and technology teams to plan, prioritize, and manage technical backlog of data deliverables

    • Perform custom SQL queries and run existing scripts and queries upon request

    • Create and Maintain scripts to automate daily tasks

    • Create and Maintain test data, and Perform/Test upgrades, code changes and bug fixes

    • Create and Maintain documentation of code, scripts, tools, applications, and processes

    • Train team members in use of internal data tools

    • Perform other job-related tasks as directed

    Requirements

    Required

    • Must possess strong written and verbal communication skills

    • Must possess strong analytical skills

    • Must be committed to executing high-quality deliverables within defined scope and schedule

    • Must be self-driven, proactive, and open to new ideas in problem solving

    • Must be comfortable working independently and collaboratively with small, distributed teams

    • Must be curious and invest time to develop and expand knowledge and proficiencies for self and others

    Preferred

    • Experience with MySQL is preferred

    • Experience with *nix CLI and shell scripting is preferred

    • Experience with git is preferred

    • Experience with designing, developing, and using custom ETL code and processes is preferred

    • Experience creating, implementing, and using REST APIs is preferred

    • Experience with music data, music data standards (DDEX, CWR, ID3) is a plus

    • Experience with multimedia management software/tools and multimedia pipelines is a plus

    • Experience with music production, audio processing, and audio codices (.mp3, FLAC, WAV) is a plus

    • Experience with provisioning and supporting AWS data implementations (S3, EC2, RDS, Lambda,

    • Redshift, etc.) is a plus

    • Experience as a tech support/account manager or client-facing project manager is a plus

  • Auth0 is a pre-IPO unicorn. We are growing rapidly and looking for exceptional new team members to add to our teams and will help take us to the next level. One team, one score. 

    We never compromise on identity. You should never compromise yours either. We want you to bring your whole self to Auth0. If you’re passionate, practice radical transparency to build trust and respect, and thrive when you’re collaborating, experimenting and learning – this may be your ideal work environment.  We are looking for team members that want to help us build upon what we have accomplished so far and make it better every day.  N+1 > N.

    The Data Scientist will help build, scale and maintain the entire data science platform. The ideal candidate will have a deep technical understanding, hands-on experience in building Machine Learning models coming up with valuable insights, and promoting a data-driven culture across the organization. They would not hesitate to wrangle data, if necessary, understand the business objectives and have a good understanding of the entire data stack.

    This position plays a key role in data initiatives, analytics projects, and influencing key stakeholders with critical business insights. You should be passionate for continuous learning, experimenting, applying and contributing towards cutting edge open source Data Science technologies.

    Responsibilities

      • Use Python and the vast array of AI/ML libraries to analyze data and build statistical models to solve specific business problems.

      • Improve upon existing methodologies by developing new data sources, testing model enhancements, and fine-tuning model parameters.

      • Collaborate with researchers, software developers, and business leaders to define product requirements and provide analytical support.

      • Directly contribute to the design and development of automated selection systems.

      • Build customer-facing reporting tools to provide insights and metrics which track system performance.

      • Communicate verbally and in writing to business customers and leadership team with various levels of technical knowledge, educating them about our systems, as well as sharing insights and recommendations

    Basic Qualifications

      • Bachelor's degree in Statistics, Applied Math, Operations Research, Engineering, Computer Science, or a related quantitative field2 years of working experience as a Data ScientistProficient with data analysis and modeling software such as Spark, R, Python etc.

      • Proficient with using scripting language such as Python and data manipulation/analysis libraries such as Scikit-learn and Pandas for analyzing and modeling data.

      • Experienced in using multiple data science methodologies to solve complex business problems.

      • Experienced in handling large data sets using SQL and databases in a business environment.

      • Excellent verbal and written communication.

      • Strong troubleshooting and problem solving skills.

      • Thrive in a fast-paced, innovative environment.

    Preferred Qualifications

      • Master's degree in Statistics, Applied Math, Operations Research, Engineering, Computer Science, or a related quantitative field.

      • 2+ years’ experience as a Data Scientist.

      • Fluency in a scripting or computing language (e.g. Python, Scala, C++, Java, etc.)

      • Superior verbal and written communication skills with the ability to effectively advocate technical solutions to scientists, engineering teams and business audiences.

      • Experienced in writing academic-styled papers for presenting both the methodologies used and results for data science projects.

      • Demonstrable track record of dealing well with ambiguity, ability to self-motivate, prioritizing needs, and delivering results in a dynamic environment.

      • Combination of deep technical skills and business savvy to interface with all levels and disciplines within our and our customer’s organizations

    Skills and Abilities

      • + BA/BS in Computer Science, related technical field or equivalent practical experience.

      • At least 3 years of relevant work experienceAbility to write, analyze, and debug SQL queries.

      • Exceptional Problem solving and analytical skills.

      • Fluent in implementing logistic regression, random forest, XGBoost, bayesian and ARIMA in Python/RExperience in User path navigation with Markov Chain, STAN Bayesian analysis for A/B testingFamiliarity with Sentiment Analysis (NLP) and LSTM AI modelsExperience in full AI/ML life-cycle from model development, training, deployment, testing, refining and iterating.

      • Experience in Tableau, Apache SuperSet, Looker or similar BI tools.

      • Knowledge of AWS Redshift, Snowflake or similar databases

    Preferred Locations:


      • #US; #AR;
    Auth0’s mission is to help developers innovate faster. Every company is becoming a software company and developers are at the center of this shift. They need better tools and building blocks so they can stay focused on innovating. One of these building blocks is identity: authentication and authorization. That’s what we do. Our platform handles 2.5B logins per month for thousands of customers around the world. From indie makers to Fortune 500 companies, we can handle any use case.

    We like to think that we are helping make the internet safer.  We have raised $210M to date and are growing quickly. Our team is spread across more than 35 countries and we are proud to continually be recognized as a great place to work. Culture is critical to us, and we are transparent about our vision and principles

    Join us on this journey to make developers more productive while making the internet safer!
  • Auth0 (US or Argentina)
    2 months ago
    Auth0 is a pre-IPO unicorn. We are growing rapidly and looking for exceptional new team members to add to our teams and will help take us to the next level. One team, one score. 

    We never compromise on identity. You should never compromise yours either. We want you to bring your whole self to Auth0. If you’re passionate, practice radical transparency to build trust and respect, and thrive when you’re collaborating, experimenting and learning – this may be your ideal work environment.  We are looking for team members that want to help us build upon what we have accomplished so far and make it better every day.  N+1 > N.

    The Data engineer will help build, scale and maintain the enterprise data warehouse. The ideal candidate will have a deep understanding of technical and functional designs for Databases, Data Warehousing and Reporting areas. The candidate should feed on challenges and love to be hands on with recent technologies.

    This job plays a key role in data infrastructure, analytics projects, and systems design and development. You should be passionate for continuous learning, experimenting, applying and contributing towards cutting edge open source Data technologies and software paradigms.

    Responsibilities:
    • Contributing at a senior-level to the data warehouse design and data preparation by implementing a solid, robust, extensible design that supports key business flows.
    • Performing all of the necessary data transformations to populate data into a warehouse table structure that is optimized for reporting.
    • Establishing efficient design and programming patterns for engineers as well as for non-technical peoples.
    • Designing, integrating and documenting technical components for seamless data extraction and analysis.
    • Ensuring best practices that can be adopted in our data systems and share across teams.
    • Contributing to innovations and data insights that fuel Auth0’s mission.
    • Working in a team environment, interact with multiple groups on a daily basis (very strong communication skills).

    Skills and Abilities:
    • + BA/BS in Computer Science, related technical field or equivalent practical experience.
    • At least 4 years of relevant work experience
    • Ability to write, analyze, and debug SQL queries.
    • Exceptional Problem solving and analytical skills.
    • Experience with Data Warehouse design, ETL (Extraction, Transformation & Load), architecting efficient software designs for DW platform.
    • Knowledge of database modeling and design in a Data Warehousing context
    • Strong familiarity with data warehouse best practices.
    • Proficiency in Python and/or R.


    Preferred Locations:
    • #AR; #US;
    Auth0’s mission is to help developers innovate faster. Every company is becoming a software company and developers are at the center of this shift. They need better tools and building blocks so they can stay focused on innovating. One of these building blocks is identity: authentication and authorization. That’s what we do. Our platform handles 2.5B logins per month for thousands of customers around the world. From indie makers to Fortune 500 companies, we can handle any use case.

    We like to think that we are helping make the internet safer.  We have raised $210M to date and are growing quickly. Our team is spread across more than 35 countries and we are proud to continually be recognized as a great place to work. Culture is critical to us, and we are transparent about our vision and principles. 

    Join us on this journey to make developers more productive while making the internet safer!

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!