YoVDO

Fugue: Unifying Big Data Analytics Ecosystems for ETL and Machine Learning

Offered By: Databricks via YouTube

Tags

Big Data Analytics Courses Machine Learning Courses Python Courses SQL Courses Kubernetes Courses Apache Spark Courses Data Processing Courses Distributed Computing Courses Data Pipelines Courses

Course Description

Overview

Explore the Fugue framework, an abstraction layer unifying various big data analytics solutions like Apache Spark, TensorFlow, Druid, Dask, and Flink. Learn how this SQL-like language represents end-to-end pipelines, extensible with Python, to create reliable, performant, and maintainable data processing workflows. Discover the benefits of a unified K8S Spark environment for interactive development, batch processing, and near real-time streaming jobs. See demonstrations of instant dependency updates, on-demand Spark K8s cluster management, and Fugue extensions for Kinesis and Kafka. Understand how Fugue provides abstraction for machine learning pipelines, enabling distributed training, hyperparameter tuning, and inference across various ML libraries. Gain insights into extensive testing on Spark 3.0 and the resulting performance improvements in this 22-minute talk from Databricks.

Syllabus

Intro
Motivation of Fugue
Node Vec: Fugue Code
Fugue Programming Model
A Workflow Example
The Fugue Extensions
Fugue SQL vs Spark SQL
Fugue Programming Interface vs SQL
Fugue ML Components
Model & Parameter Sweeping model
Benchmark Test
An Interactive On-demand Spark Ecosystem
Summary


Taught by

Databricks

Related Courses

Artificial Intelligence for Robotics
Stanford University via Udacity
Intro to Computer Science
University of Virginia via Udacity
Design of Computer Programs
Stanford University via Udacity
Web Development
Udacity
Programming Languages
University of Virginia via Udacity