Parallelizing Your ETL with Dask on Kubeflow
Offered By: MLOps World: Machine Learning in Production via YouTube
Course Description
Overview
Learn how to parallelize ETL processes using Dask on Kubeflow in this comprehensive conference talk. Explore the integration of Dask, a powerful Python library for parallel computing, with Kubeflow, a popular MLOps platform built on Kubernetes. Discover how to leverage Dask's advanced parallelism capabilities within Kubeflow's notebook service and pipeline workflows. Gain insights into the new Dask Operator for Kubernetes, which enables users to launch Dask clusters from Jupyter sessions and pipeline steps. Understand how to utilize Dask's distributed computing power to process larger-than-memory datasets and optimize performance in machine learning pipelines. Follow along as the speaker demonstrates installation procedures, provides practical examples, and showcases the benefits of combining Dask and Kubeflow for efficient data processing and ML workflows.
Syllabus
Parallelizing Your ETL with Dask on Kubeflow
Taught by
MLOps World: Machine Learning in Production
Related Courses
Introduction to Cloud Infrastructure TechnologiesLinux Foundation via edX Scalable Microservices with Kubernetes
Google via Udacity Google Cloud Fundamentals: Core Infrastructure
Google via Coursera Introduction to Kubernetes
Linux Foundation via edX Fundamentals of Containers, Kubernetes, and Red Hat OpenShift
Red Hat via edX