YoVDO

Reproducible Data Science Over Data Lakes

Offered By: MLOps.community via YouTube

Tags

Data Lakes Courses MLOps Courses Data Engineering Courses Time Travel Courses Data Pipelines Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore reproducible data science techniques for data lakes in this 12-minute conference talk by Ciro Greco, presented at DE4AI. Delve into the challenges of achieving reproducibility in Lakehouse architectures and discover recent advancements made at Bauplan to address these issues. Learn about a system that decouples compute from data management by utilizing a cloud runtime alongside Nessie, an open-source catalog with Git semantics. Gain insights into how this system offers time-travel and branching semantics on top of object storage, enabling full pipeline reproducibility with simple CLI commands. Understand the importance of overcoming challenges such as slow testing, complex debugging, and error susceptibility in large data pipelines. Benefit from Ciro Greco's expertise as the former VP of AI at Coveo and founder of Tooso.ai, bringing his background in linguistics, cognitive neuroscience, and experience in Information Retrieval and Natural Language Processing to this presentation on replayable data pipelines with Bauplan and Nessie.

Syllabus

Reproducible data science over data lakes // Ciro Greco // DE4AI


Taught by

MLOps.community

Related Courses

Google Cloud Big Data and Machine Learning Fundamentals en Español
Google Cloud via Coursera
Data Analysis with Python
IBM via Coursera
Intro to TensorFlow 日本語版
Google Cloud via Coursera
TensorFlow on Google Cloud - Français
Google Cloud via Coursera
Freedom of Data with SAP Data Hub
SAP Learning