Python Data Engineer

Parse.ly


2 weeks ago

11/02/2019 10:20:44

Job type: Full-time

Category: All others


Parse.ly is a real-time content measurement layer for the entire web.

Our analytics platform helps digital storytellers at some of the web's best sites, such as Arstechnica, The New Yorker, The Wall Street Journal, TechCrunch, The Intercept, Mashable, and many more. In total, our analytics system handles over 65 billion monthly events from over 1 billion monthly unique visitors.

Our entire stack is in Python and JavaScript, and our team has innovated in areas related to real-time analytics, building some of the best open source tools for working with modern stream processing technologies.

On the open source front, we maintain streamparse, the most widely used Python binding for the Apache Storm streaming data system. We also maintain pykafka, the most performant and Pythonic binding for Apache Kafka.

Our colleagues are talented: our UX/design team has also built one of the best-looking dashboards on the planet, using AngularJS and D3.js, and our infrastructure engineers have built a scalable, devops-friendly cloud environment.

As a Python Data Engineer, you will help us expand our reach into the area of petabyte-scale data analysis -- while ensuring consistent uptime, provable reliability, and top-rated performance of our backend streaming data systems.

We’re the kind of team that does “whatever it takes” to get a project done.

Parse.ly’s data engineering team already makes use of modern technologies like Python, Storm, Spark, Kafka, and Elasticsearch to analyze large datasets. As a Python Data Engineer at Parse.ly, you will be expected to master these technologies, while also being able to write code against them in Python, and debug issues down to the native C code and native JVM code layers, as necessary.

This team owns a real-time analytics infrastructure that processes over 2 million pageviews per minute from over 2,000 high-traffic sites. It operates a fleet of cloud servers that include thousands of cores of live data processing. We have written publicly about mage, our time series analytics engine. This will give you an idea about the kinds of systems we work on.

What you'll do

For this role, you should already be a proficient Python programmer who wants to work with data at scale.

In the role, you’ll...

  • Write Python code using the best practices. See The Elements of Python Style, written by our CTO, for an example of our approach to code readability and design.

  • Analyze data at massive scale. You need to be comfortable with the idea of your code running across 3,000 Python cores, thanks to process-level parallelization.

  • Brainstorm new product ideas and directions with team and customers. You need to be a good communicator, especially in written form.

  • Master cloud technologies and systems. You should love UNIX and be able to reason about distributed systems.

Benefits

  • Our distributed team is best-in-class and we happily skip commutes by working out of our ergonomic home offices. Here's a photograph of our CTO's setup running two full-screen Parse.ly dashboards.

  • Work from home or anywhere else in our industry-leading distributed team.

  • Earn a competitive salary and benefits (health/dental/401k).

  • Splurge with a generous equipment budget.

  • Work with one of the brightest teams in tech.

  • Speak at and attend conferences like PyData on Parse.ly's dime.

Parse.ly Tech

  • Python for both backend and frontend -- 2.7, some systems in 3.x, and we're going full-on 3.x soon.

  • Amazon Web Services used for most systems.

  • Modern databases like Cassandra, ElasticSearch, Redis, and Postgres.

  • Frameworks like Django, Tornado and the PyData stack (e.g. Pandas).

  • Running Kafka, Storm, Spark in production atop massive data sets.

  • Easy system management with Fabric and Chef.

Fully distributed team

  • Parse.ly is a fully distributed team, with engineers working from across the world. People with past experience working remotely will be prioritized. US/Eastern timezones will be prioritized.

Apply

  • Send a cover letter, CV/resume, and optionally links to projects or code, to [email protected] Make sure to indicate you are applying for the "Python Data Engineer" role.

Please mention that you come from Remotive when applying for this job.

Help us maintain Remotive! If this link is broken, please just click to report dead link!

similar jobs

  • 4 weeks ago

    Doximity is transforming the healthcare industry. Our mission is to help doctors be more productive, informed, and connected. As a Data Analyst, you'll work within cross-functional delivery teams alongside other analysts, engineers, and product managers in discovering data insights to help improve healthcare.

    Our team brings a diverse set of technical and cultural backgrounds and we like to think pragmatically in choosing the tools most appropriate for the job at hand.

    About Us

    Here are some of the ways we bring value to doctors:

    • Our data stack run on Python, Snowflake, Spark, and Airflow

    • Our web applications are built primarily using Ruby, Rails, JavaScript (Vue.js), and a bit of Golang

    • We have over 350 private repositories in Github containing our applications, forks of gems, our own internal gems, and open-source projects

    • We have worked as a distributed team for a long time; we're currently about 65% distributed

    Find out more information on the Doximity engineering blog.

    Here's How You Will Make an Impact

    • Play a key role in creating client-facing analytics by working closely with our Client Service Managers and Sales teams.

    • Collaborate with a team of product managers, analysts and other developers to define and complete data projects from analysis to reporting.

    • Show off your engineering skills by creating data products from scratch and automating code so they can be re-used continually.

    • Leverage Doximity's extensive datasets to identify and classify behavioral patterns of medical professionals on our platform.

    • Grow into a presentation/communication-focused role or dive deeper into more-involved technical challenges - the choice is yours.

    About you

    • B.S. or M.S. in quantitative field with 2-4 years of experience.

    • Working knowledge of statistics and visualization.

    • Expert SQL skills with proven ability to create and to evaluate complex SQL statements involving numerous tables and complex relationships.

    • Fluent in Python and experience using common modules (numpy, pandas, statsmodels, matplotlib) for EDA.

    • Comfortable with UNIX command line interface and standard programming tools (vim/emacs, git, etc.)

    • Excellent problem solving skills and a strong attention to detail.

    • Ability to manage time well and prioritize incoming tasks from different stakeholders.

    • Fast learner; curiosity about and passion for data.

    Preferred Qualifications:

    • Experience with Amazon Web Services products (EC2, S3, Snowflake).

    • Prior exposure to workflow management tools (Airflow).

    • Experience leveraging Apache Spark to perform analyses or process data.

    Fun facts about the Insights team:

    • We have access to one of the richest healthcare datasets in the world, with deep information on hundreds of thousands of healthcare professionals and their connections

    • The members of our team bring a diverse set of technical and cultural backgrounds

    • Business decisions at Doximity are driven by our data, analyses, and insights

    • We like to have fun - company outings, team lunches, and happy hours

    Benefits

    Doximity has industry leading benefits. For an updated list, see our career page

    More info on Doximity

    We’re thrilled to be named the Fastest Growing Company in the Bay Area, and one of Fast Company’s Most Innovative Companies. Joining Doximity means being part of an incredibly talented and humble team. We work on amazing products that over 70% of US doctors (and over one million healthcare professionals) use to make their busy lives a little easier. We’re driven by the goal of improving inefficiencies in our $3.5 trillion U.S. healthcare system and love creating technology that has a real, meaningful impact on people’s lives. To learn more about our team, culture, and users, check out our careers page, company blog, and engineering blog. We’re growing steadily, and there’s plenty of opportunity for you to make an impact.

    Doximity is proud to be an equal opportunity employer, and committed to providing employment opportunities regardless of race, religious creed, color, national origin, ancestry, physical disability, mental disability, medical condition, genetic information, marital status, sex, gender, gender identity, gender expression, pregnancy, childbirth and breastfeeding, age, sexual orientation, military or veteran status, or any other protected classification. We also consider qualified applicants with criminal histories, consistent with applicable federal, state and local law.

  • Job description

    Lead the evolution of our next-gen Data Warehouse to enable our Data Science & Analytics teams in developing insights that genuinely drive our business, and that of our international clients. What is waiting for you here is a company that moves fast and where data is a top priority. No red tape to cut through. If you have a vision and you want to use your business intelligence skills to apply the newest technologies, this is your chance! 

    Your job

    You will deal with plenty of complexity since we work with a multitude of clients for whom we need to integrate many data in very distinct ways. Along with two colleagues, you make sure that all these data are effectively fed into our data warehouse to enable state of the art reporting & analytics. As an architect, we'll further challenge you by bringing everything we do to the cloud. You also ensure that our data warehouse can keep up with the rapid growth of our business. And, you will strategize on the inclusion of non-operational data sources that will enable exciting insights for our clients. Imagine being able to combine our data, with game usage data, to predict in-game spending, for example. 

    Our current platform is modern and mostly built in the cloud (Azure). It is a microservices based, large distributed system. We believe in continuously evaluating our stack, and we have a smooth process of suggesting and adopting new technologies. Our set of technologies: 

    • SQL server, analysis services tabular, power BI 

    • R, Python 

    • Azure Functions, Azure ML, Azure VM's 

    • What you bring!

    Requirements

    To do this job, you should be able to: 

    • Architect and own the technological innovation;

    • Advise on and implement new technologies and future improvements;

    • Conceive, design, develop and deploy the data architecture and data models from scratch;

    • Collaborate across business units (Operations, Reporting and Analysis, Data Science) to craft a vision, strategy, and roadmap;

    • Train and mentor the team in data modeling and data quality related processes and standards;

    • Communicate fluently in English, allowing you to operate comfortably in a highly international organization;

    • Adapt easily within an environment where things move fast.

    How to apply

    Feel eager to apply? Let's get to know each other! Please help us to understand how you see yourself matching up. A list of technologies is great, yet what we are especially excited to learn about is your ability to lead with vision and fulfil the role of an actual architect. So, don't hold back to tell us more about that! For most vacancies, an online assessment is part of the application process.

    About us

    We are 5CA. For the past 20 years, we've used our expertise to help our clients build  their CX & support strategy. Focused on three industries: video games, consumer electronics, and eCommerce, we provide omnichannel support in a wide variety of languages, always using the latest technological innovations.    

    We’re headquartered in Utrecht, The Netherlands, with offices in Los Angeles, Buenos Aires, and Hongkong. For our contact services, we use a mix of onsite and remote support specialists. A highly flexible and dynamic model, by which we help our clients deal with challenging situations.    

    5CA offers a fast-paced, dynamic workplace where every day is different, and developments take place in days, not months. Our culture is shaped by a spirited workforce, hailing from all corners of the globe. We all share a thirst for new and exciting technology and gaming as the binding factor. 5CA has a flat hierarchy, where you are encouraged to think big, dream big, and live up to your full potential. 

  • Mammoth Growth (US or Canada)
    2 months ago

    Mammoth Growth is seeking a Data Engineer with extensive experience in building data pipelines, ETL scripts, and data warehouses in a modern cloud environment. We are a fast-paced, rapidly growing growth data analytics consultancy helping businesses build cutting edge analytics environments.

    As a data engineer in a rapidly growing team, you will work with a variety of exciting high growth businesses building their future data environment. This is an excellent opportunity to sharpen and broaden your skills in a fast-paced, challenging environment.

    This is a remote position.

    Responsibilities

    • Build custom integrations with 3rd party APIs

    • Building ETLs to move and transform data

    • Put together an end-to-end data pipeline using cutting edge tools and techniques

    • Designing data warehouses and data lakes

    • Use your knowledge and experience to help shape new process

    Skills

    • Python

    • AWS Lambda

    • SQL

    • Spark / Databricks / AWS Glue

    • Database experience (Redshift, Snowflake or Data Lake a plus)

    Qualities

    • Independently organized; self-starter; ability to work with minimal direction

    • Enjoy learning new tools to better solve new challenges

    • Attention to detail and ability to ask the right questions

    • Good communication / client facing skills

    • Can switch between simultaneous projects easily

    If you think you are a good fit for the role send us a quick note on why and include the sum of 18 and 22 (bonus points for creativity).

Remotive can help!

Not sure how to apply properly to this job? Watch our live webinar « 3 Mistakes to Avoid When Looking For A Remote Startup Job (And What To Do Instead) ».

Interested to chat with Remote workers? Join our community!