From Idea to Model: Productionizing Data Pipelines with Apache Airflow
Offered By: Databricks via YouTube
Course Description
Overview
Explore the process of transforming a data science idea into a production-ready model using Apache Airflow in this 22-minute conference talk from Databricks. Learn how data engineers can build a flexible platform that satisfies the needs of various stakeholders, including data scientists, infrastructure engineers, and product owners. Discover how Apache Airflow serves as a collaborative tool between data scientists and infrastructure engineers, offering a pythonic interface that abstracts system complexities. Follow the journey of a single-machine notebook evolving into a cross-service Spark + Tensorflow pipeline, culminating in a canary-tested, hyper-parameter-tuned model deployed on Google Cloud Functions. Gain insights into Airflow's ability to connect different layers of a data team, enabling rapid results and efficient collaboration. Understand the benefits for both data engineers and analysts, including custom operator creation, job submission, and pipeline building. Delve into topics such as the data ecosystem, bumper rail models, and the advantages of using established tools over building from scratch.
Syllabus
Intro
Data Ecosystem
Data Scientists
Data Infrastructure
Data Analysts
Bumper Rail Model
Don't Build Your Own!!
What's in it for the Data Engineers?
Submitting a Spark Job
Can Abstract Many Spark System Configurations
Data Engineers Can Create Custom Operators
What's in it for the Analysts?
Building a Data Science Pipeline
Experiment
Jupyter Notebooks + Airflow
Parameterize
Getting involved with Apache Airflow
Taught by
Databricks
Related Courses
Introduction to Artificial IntelligenceStanford University via Udacity Natural Language Processing
Columbia University via Coursera Probabilistic Graphical Models 1: Representation
Stanford University via Coursera Computer Vision: The Fundamentals
University of California, Berkeley via Coursera Learning from Data (Introductory Machine Learning course)
California Institute of Technology via Independent