YoVDO

The Daft Distributed Python Data Engine: Multimodal Data Curation at Any Scale

Offered By: MLOps.community via YouTube

Tags

Python Courses Artificial Intelligence Courses Machine Learning Courses Data Engineering Courses Distributed Computing Courses ETL Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the Daft distributed Python data engine for multimodal data curation at any scale in this 27-minute talk by Jay Chia. Discover how Daft addresses the fundamental needs of ML/AI data platforms, including terabyte-scale ETL with complex model batch inference, analytics for multimodal datatypes using SQL, and performant dataloading for model training and inference. Learn why other tools fall short in meeting these requirements and see a full example of building a highly performant data platform using the Daft Dataframe and open file formats like JSON and Parquet. Gain insights from Jay's experience in ML Infrastructure across biotech and autonomous driving industries, and understand how Daft can revolutionize your approach to data curation for ML/AI projects in 2024 and beyond.

Syllabus

The Daft distributed Python data engine: multimodal data curation at any scale // Jay Chia // DE4AI


Taught by

MLOps.community

Related Courses

Building Batch Data Pipelines on GCP auf Deutsch
Google Cloud via Coursera
Building Batch Data Pipelines on GCP en Français
Google Cloud via Coursera
Mastering Azure Data Factory: From Basics to Advanced Level
Udemy
Data Science de A a Z - Extraçao e Exibição dos Dados
Udemy
Building Batch Data Processing Solutions in Microsoft Azure
Pluralsight