YoVDO

Scaling Machine Learning Workflows to Big Data with Fugue

Offered By: CNCF [Cloud Native Computing Foundation] via YouTube

Tags

Conference Talks Courses Big Data Courses Machine Learning Courses Python Courses pandas Courses Data Transformation Courses Lazy Evaluation Courses Dask Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore scaling machine learning workflows to big data using Fugue in this 29-minute conference talk from KubeCon + CloudNativeCon Europe 2022. Learn how to transition from Pandas to distributed computing frameworks like Spark or Dask without reimplementing code. Discover Fugue's open-source abstraction layer that allows data scientists to write framework-agnostic and scale-agnostic code. Follow along as the speakers demonstrate porting native Python code to Spark or Dask with minimal changes, and witness the scaling of data compute from a single machine to a Spark cluster on Kubernetes. Gain insights into lazy evaluation, partitioning, testing, and decoupling logic from execution in big data workflows.

Syllabus

Introduction
Demo Overview
Han Wang Introduction
First Example
Spark
Transformation
Fugue Code
Model
Field Workflow
Results
Physical
Prediction
Pandas vs Spark
Lazy evaluation of Spark
Partitioning
Testing
Fugue
Decouple logic and execution
Demo
Notebook extension
Conclusion
Recap


Taught by

CNCF [Cloud Native Computing Foundation]

Related Courses

ETL and ELT Basics
A Cloud Guru
Programming Use Cases with Python
A Cloud Guru
Microsoft Power BI: Advanced Data Analysis and Visualisation
Cloudswyft via FutureLearn
Amazon Connect Data Streaming Intermediate
Amazon Web Services via AWS Skill Builder
Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))
Amazon Web Services via AWS Skill Builder