Auditing Your Data and Answering the Question - Is It the End of the Day Yet?
Offered By: NDC Conferences via YouTube
Course Description
Overview
Explore the intricacies of data auditing and workflow optimization in this 37-minute conference talk from NDC Conferences. Dive into Nielsen's robust Kafka architecture and ETL processes, uncovering strategies to track, analyze, and store auditing information. Learn about implementing an AVRO Audit header, designing metadata for auditing heartbeats, and optimizing auditing tables. Discover how to create an alert-based monitoring system using technologies like Kafka, Avro, Spark, Lambda functions, and complex SQL queries. Gain insights into managing partitions with Apache Airflow, offloading data to history, and scheduling Spark jobs. Understand how to detect duplications and implement an effective alerting system. Finally, tackle the age-old question: "Is it the end of the day yet?" through the lens of data processing and legacy problem-solving.
Syllabus
Intro
Nielsen's Architecture (AT THE TIME)
Data Lake
Data Arrival Pain Points
Recovering from failures
Is it the end of the day yet? When do we process data?
Is it the end of day yet? Legacy answers to a legacy problem
Little Fires Everywhere
Auditing window? Let's design our metadata
Auditing Header Injection
Shipping Audit Window to Collection Point
Consuming Audit Data
In Context
Storing Data and Querying to Optimum
Designing Out Output Table
Shout out to my dad....
Optimizing PostgreSQL for Audit Queries
Managing Partitions with Apache Airflow
Offloading Data to History
Scheduling your spark job
It is not the end of the day
Alerts and add-ons
Alerting system
Detecting duplications
Taught by
NDC Conferences
Related Courses
Coding the Matrix: Linear Algebra through Computer Science ApplicationsBrown University via Coursera كيف تفكر الآلات - مقدمة في تقنيات الحوسبة
King Fahd University of Petroleum and Minerals via Rwaq (رواق) Datascience et Analyse situationnelle : dans les coulisses du Big Data
IONIS via IONIS Data Lakes for Big Data
EdCast 統計学Ⅰ:データ分析の基礎 (ga014)
University of Tokyo via gacco