Orchestrating Data Assets Instead of Tasks, With Dagster - Sandy Ryza
Offered By: Open Data Science via YouTube
Course Description
Overview
Discover the power of data orchestration in this insightful talk by Sandy Ryza, lead of the Dagster project at Elementl. Learn how orchestrators serve as the backbone for keeping data assets up-to-date and synchronized, from datasets to ML models. Explore the concept of data pipelines, delve into Apache Airflow, and understand the process of building and deploying pipelines. Gain valuable insights into the development lifecycle, including local development, unit testing, review and staging, as well as debugging and monitoring techniques. Perfect for data engineers, machine learning enthusiasts, and professionals interested in optimizing data synchronization and advanced analytics.
Syllabus
- Introductions
- What is a data pipeline?
- Apache Airflow
- Building a pipeline
- The development lifecycle
- Local development
- Unit/regression testing
- Review and staging
- Deploying
- Debugging/monitoring
- To sum up
- Q&A
Taught by
Open Data Science
Related Courses
内存数据库管理openHPI CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX Processing Big Data with Azure Data Lake Analytics
Microsoft via edX Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera Google Cloud Big Data and Machine Learning Fundamentals 日本語版
Google Cloud via Coursera