Delta Lake for Polyglot Data and Machine Learning Workloads
Offered By: Databricks via YouTube
Course Description
Overview
Explore the power of Delta Lake in this 18-minute conference talk by Micha Kunze, Lead Data Engineer at Maersk. Discover how Delta Lake serves as a robust open table format for managing over 1000 datasets with 20,000+ daily job runs. Learn about its seamless integration with Apache Spark™, providing exactly-once semantics for micro-batching data transport and transformation. Delve into Delta Lake's versatility, supporting Python and Rust for machine learning applications. Gain insights into leveraging Delta as the backbone for operational data streams, explorative data analysis, and reproducible ML workloads. Understand techniques for monitoring data quality and streams using table metadata. Explore practical applications of Delta, including implementing WAP patterns to prevent bad data, optimizing compute job efficiency, building auto-ML models, and automating Spark Structured Streaming job monitoring and alerting.
Syllabus
The Beauty of Delta for Polyglot Data and ML Workloads
Taught by
Databricks
Related Courses
Google Cloud Big Data and Machine Learning Fundamentals en EspañolGoogle Cloud via Coursera Big Data Emerging Technologies
Yonsei University via Coursera Building Resilient Streaming Systems on GCP em Português Brasileiro
Google Cloud via Coursera Building Resilient Streaming Systems on Google Cloud Platform en Español
Google Cloud via Coursera AWS Certified Data Analytics Specialty 2024 - Hands On!
Udemy