YoVDO

Caching Framework for Exabyte-Scale Data Lakes

Offered By: The ASF via YouTube

Tags

Data Lakes Courses Hadoop Courses Cloud Migration Courses Parquet Courses Alluxio Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover how to design and implement an open-source caching framework for exabyte-scale data lakes in this 40-minute conference talk. Learn about the challenges of data access performance and cost in large-scale data lakes, and explore solutions to improve performance by 1.5x while reducing storage costs by millions per year. Gain insights into architecting caching systems that accelerate queries, maximize cache hit rates, and cut data storage costs. Explore the open-source stack, including Hadoop, Parquet, Hudi, and Alluxio, used to achieve a balance between performance and cost efficiency. Delve into advanced techniques for improving cache hit rates, such as segmented data file caching, soft-affinity scheduler policies, and cache filtering. Learn how to monitor cache usage and working set size using comprehensive trace and JMX metrics. By the end of this session, acquire valuable knowledge to tackle the complexities of managing and optimizing exabyte-scale data lakes in both on-premises and cloud environments.

Syllabus

Caching Framework for Exabyte-Scale Data Lakes


Taught by

The ASF

Related Courses

Amazon Redshift Getting Started
Amazon Web Services via AWS Skill Builder
Amazon Security Lake Getting Started
Amazon Web Services via AWS Skill Builder
Introduction to Designing Data Lakes on AWS
Amazon Web Services via edX
Aspectos básicos del análisis en AWS: parte 2 (Español LATAM) | Fundamentals of Analytics on AWS – Part 2 (LATAM Spanish)
Amazon Web Services via AWS Skill Builder
Automate Validation using the Data Validation Tool (DVT)
Google via Google Cloud Skills Boost