YoVDO

From Idea to Model: Productionizing Data Pipelines with Apache Airflow

Offered By: Databricks via YouTube

Tags

Apache Airflow Courses Data Science Courses Machine Learning Courses TensorFlow Courses Jupyter Notebooks Courses Data Engineering Courses Data Pipelines Courses

Course Description

Overview

Explore the process of transforming a data science idea into a production-ready model using Apache Airflow in this 22-minute conference talk from Databricks. Learn how data engineers can build a flexible platform that satisfies the needs of various stakeholders, including data scientists, infrastructure engineers, and product owners. Discover how Apache Airflow serves as a collaborative tool between data scientists and infrastructure engineers, offering a pythonic interface that abstracts system complexities. Follow the journey of a single-machine notebook evolving into a cross-service Spark + Tensorflow pipeline, culminating in a canary-tested, hyper-parameter-tuned model deployed on Google Cloud Functions. Gain insights into Airflow's ability to connect different layers of a data team, enabling rapid results and efficient collaboration. Understand the benefits for both data engineers and analysts, including custom operator creation, job submission, and pipeline building. Delve into topics such as the data ecosystem, bumper rail models, and the advantages of using established tools over building from scratch.

Syllabus

Intro
Data Ecosystem
Data Scientists
Data Infrastructure
Data Analysts
Bumper Rail Model
Don't Build Your Own!!
What's in it for the Data Engineers?
Submitting a Spark Job
Can Abstract Many Spark System Configurations
Data Engineers Can Create Custom Operators
What's in it for the Analysts?
Building a Data Science Pipeline
Experiment
Jupyter Notebooks + Airflow
Parameterize
Getting involved with Apache Airflow


Taught by

Databricks

Related Courses

Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera
Data Analysis with Python
IBM via Coursera
Intro to TensorFlow 日本語版
Google Cloud via Coursera
TensorFlow on Google Cloud - Français
Google Cloud via Coursera
Freedom of Data with SAP Data Hub
SAP Learning