YoVDO

Delta Lake for Polyglot Data and Machine Learning Workloads

Offered By: Databricks via YouTube

Tags

Delta Lake Courses Machine Learning Courses Python Courses Apache Spark Courses Rust Courses Data Engineering Courses Data Streaming Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the power of Delta Lake in this 18-minute conference talk by Micha Kunze, Lead Data Engineer at Maersk. Discover how Delta Lake serves as a robust open table format for managing over 1000 datasets with 20,000+ daily job runs. Learn about its seamless integration with Apache Sparkā„¢, providing exactly-once semantics for micro-batching data transport and transformation. Delve into Delta Lake's versatility, supporting Python and Rust for machine learning applications. Gain insights into leveraging Delta as the backbone for operational data streams, explorative data analysis, and reproducible ML workloads. Understand techniques for monitoring data quality and streams using table metadata. Explore practical applications of Delta, including implementing WAP patterns to prevent bad data, optimizing compute job efficiency, building auto-ML models, and automating Spark Structured Streaming job monitoring and alerting.

Syllabus

The Beauty of Delta for Polyglot Data and ML Workloads


Taught by

Databricks

Related Courses

CS115x: Advanced Apache Spark for Data Science and Data Engineering
University of California, Berkeley via edX
Big Data Analytics
University of Adelaide via edX
Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera
Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera