Data Storage and Queries
Offered By: DeepLearning.AI via Coursera
Course Description
Overview
          In this course, you will learn about the raw ingredients and processes that are used to physically store data on disk and in memory. You’ll explore different storage systems, including object, block, and file storage, as well as databases, that are built on top of these raw ingredients. You’ll also get a chance to use the Cypher language to query a Neo4j graph database, and perform vector similarity search, a key feature behind generative AI and large language models. You will explore the evolution of data storage abstractions, from data warehouses, to data lakes, and data lakehouses, while comparing the advantages and drawbacks of each architectural paradigm. With hands-on practice, you will design a simple data lake using Amazon Glue, and build a data lakehouse using AWS LakeFormation and Apache Iceberg. In the last week of this course, you’ll see how queries work behind the scenes, practice writing more advanced SQL queries, compare the query performance in row vs column-oriented storage, and perform streaming queries using Apache Flink.
        
Syllabus
- Storage Ingredients and Storage Systems
- Storage Abstractions
- Queries
Taught by
Joe Reis
Tags
Related Courses
Building Modern Data Streaming Apps with Open SourceLinux Foundation via YouTube How to Stabilize a GenAI-First Modern Data LakeHouse - Provisioning 20,000 Ephemeral Data Lakes per Year
CNCF [Cloud Native Computing Foundation] via YouTube Delivering Portability to Open Data Lakes with Delta Lake UniForm
Databricks via YouTube Fast Copy-On-Write in Apache Parquet for Data Lakehouse Upserts
Databricks via YouTube Capital One's Data Innovation Strategy - You Build, Your Data (YBYD)
Databricks via YouTube
