ETL Pipeline to Achieve Reliability at Scale
Offered By: EuroPython Conference via YouTube
Course Description
Overview
Explore the design and implementation of a scalable ETL pipeline for handling high-volume financial transactions in an online betting exchange. Learn how to achieve reliability and accuracy in daily accounting reports while addressing challenges such as fault tolerance, fast data retrieval, and efficient computations. Discover the motivations behind the chosen tech stack, including Python3, Luigi, and Spark, and gain insights into solving key technical problems such as identifying and rerunning faulty steps, optimizing input/output operations, and enhancing computational speed. Delve into Spark's key concepts, execution processes, and integration with Luigi, as well as running Spark jobs on Amazon EMR for improved performance and scalability.
Syllabus
Intro
Accounting at markets
Fault tolerance and reliability
Efficient storage
Good performance
Spark key concepts
Execution on Spark
Spark job from Luigi
Spark on EMR
Shutdown EMR cluster
Taught by
EuroPython Conference
Related Courses
MongoDB for DBAsMongoDB University MongoDB Advanced Deployment and Operations
MongoDB University Building Cloud Apps with Microsoft Azure - Part 3
Microsoft via edX Implementing Microsoft Windows Server Disks and Volumes
Microsoft via edX Cloud Computing and Distributed Systems
Indian Institute of Technology Patna via Swayam