Auditing Your Data and Answering the Question - Is It the End of the Day Yet?
Offered By: NDC Conferences via YouTube
Course Description
Overview
Explore the intricacies of data auditing and workflow optimization in this 37-minute conference talk from NDC Conferences. Dive into Nielsen's robust Kafka architecture and ETL processes, uncovering strategies to track, analyze, and store auditing information. Learn about implementing an AVRO Audit header, designing metadata for auditing heartbeats, and optimizing auditing tables. Discover how to create an alert-based monitoring system using technologies like Kafka, Avro, Spark, Lambda functions, and complex SQL queries. Gain insights into managing partitions with Apache Airflow, offloading data to history, and scheduling Spark jobs. Understand how to detect duplications and implement an effective alerting system. Finally, tackle the age-old question: "Is it the end of the day yet?" through the lens of data processing and legacy problem-solving.
Syllabus
Intro
Nielsen's Architecture (AT THE TIME)
Data Lake
Data Arrival Pain Points
Recovering from failures
Is it the end of the day yet? When do we process data?
Is it the end of day yet? Legacy answers to a legacy problem
Little Fires Everywhere
Auditing window? Let's design our metadata
Auditing Header Injection
Shipping Audit Window to Collection Point
Consuming Audit Data
In Context
Storing Data and Querying to Optimum
Designing Out Output Table
Shout out to my dad....
Optimizing PostgreSQL for Audit Queries
Managing Partitions with Apache Airflow
Offloading Data to History
Scheduling your spark job
It is not the end of the day
Alerts and add-ons
Alerting system
Detecting duplications
Taught by
NDC Conferences
Related Courses
Introduction to DatabasesMeta via Coursera Web Development
Udacity Introduction to Data Science
University of Washington via Coursera Datenmanagement mit SQL
openHPI Sabermetrics 101: Introduction to Baseball Analytics
Boston University via edX