Feeding Data to AWS Redshift with Airflow
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore a comprehensive talk from EuroPython 2017 on leveraging Airflow for efficient data pipelines to AWS Redshift. Dive into the fundamentals of Airflow, including its scheduling capabilities and workflow management features. Learn about data pipeline-specific concepts such as backfills and retries, and discover practical examples of integration. Gain insights into structuring data in Redshift, performing basic pre-loading transformations, and managing schemas using SQLAlchemy and Alembic. Follow along as the speaker shares valuable lessons learned and addresses common Redshift challenges. Perfect for data engineers and analysts looking to optimize their ETL processes and harness the power of Airflow in conjunction with AWS Redshift.
Syllabus
Introduction
Federicos background
Product problem
Is it simple
Data pipelines
Scale
Stages
Archive
Airflow
Python
Database
UI
Workflow
Operators
Airflow UI
Airflow scheduling
Tracking state
Downtime
Scrapers
Batch IDs
Timestamp
Formats
Redshift copy command
JSON path flattening
Schema conversion
Migration framework
Redshift annoyances
Futureproof
Thank you
Taught by
EuroPython Conference
Related Courses
A Brief History of Data StorageEuroPython Conference via YouTube Breaking the Stereotype - Evolution & Persistence of Gender Bias in Tech
EuroPython Conference via YouTube We Can Get More from Spatial, GIS, and Public Domain Datasets
EuroPython Conference via YouTube Using NLP to Detect Knots in Protein Structures
EuroPython Conference via YouTube The Challenges of Doing Infra-As-Code Without "The Cloud"
EuroPython Conference via YouTube