YoVDO

Data Lake - Design for Schema Evolution

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Scalability Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the challenges and solutions for managing schema evolution in data lakes through this informative EuroPython 2021 conference talk. Learn best practices for storage, control, scalability, and availability in data lake design. Discover how Episource tackled the complex task of storing and searching evolving nested JSON data from their NLP engine processing millions of medical documents. Gain insights into implementing a solution using AVRO format for schema evolution, leveraging a Schema registry for version control, and utilizing Athena for distributed SQL queries. Understand the benefits of both "schema-on-write" and "schema-on-read" approaches in maintaining data integrity and compatibility across schema changes.

Syllabus

Prakshi Yadav - Data lake: Design for schema evolution


Taught by

EuroPython Conference

Related Courses

A Brief History of Data Storage
EuroPython Conference via YouTube
Breaking the Stereotype - Evolution & Persistence of Gender Bias in Tech
EuroPython Conference via YouTube
We Can Get More from Spatial, GIS, and Public Domain Datasets
EuroPython Conference via YouTube
Using NLP to Detect Knots in Protein Structures
EuroPython Conference via YouTube
The Challenges of Doing Infra-As-Code Without "The Cloud"
EuroPython Conference via YouTube