Building a Multimodal Data Lakehouse with the Daft Distributed Python Dataframe
Offered By: Databricks via YouTube
Course Description
Overview
Explore how to build a multimodal data lakehouse using Daft, a next-generation distributed query engine, in this 18-minute conference talk. Learn about processing diverse data types including numbers, strings, JSONs, images, and PDFs at scale using a familiar dataframe interface. Discover how Daft simplifies large-scale ETL processes, eliminating the need for bespoke data pipelines and custom tooling. See a demonstration of integrating Daft with existing infrastructure like S3, DeltaLake, Databricks, and Spark to create a powerful and flexible data processing solution. Gain insights from Jay Chia, Co-Founder of Eventual Computing, on leveraging Daft's Python and Rust-based architecture for efficient multimodal data handling in modern data workloads.
Syllabus
Building a Multimodal Data Lakehouse with the Daft Distributed Python Dataframe
Taught by
Databricks
Related Courses
Artificial Intelligence for RoboticsStanford University via Udacity Intro to Computer Science
University of Virginia via Udacity Design of Computer Programs
Stanford University via Udacity Web Development
Udacity Programming Languages
University of Virginia via Udacity