3rd Party Data Burns
Offered By: YouTube
Course Description
Overview
Explore a conference talk that delves into the complexities of handling third-party data sets, focusing on normalization techniques, wildcard usage versus domain:key approaches, and lessons learned from red-teaming experiences. Learn about the growth of a data analysis tool, measured using Chris Roberts' metric, and discover intriguing insights about data quality issues, including the prevalence of unusual entries in large-scale SQL dumps. Gain valuable knowledge on inheriting and managing complex data sets, cleaning processes, and the importance of data integrity in cybersecurity and information analysis.
Syllabus
RD PARTY DATA BURNS
the traditional about me moment!
you inherit the complexity of the data-set by virtue. so the more of a mess it is, the more cleaning up you need to do.
normalization, wild-cards Vs domain:key and the 'Big Lesson learned
when we've been used in red-teaming, the tool has kicked ass!
I've been using the Chris Roberts' metric to track the growth of the tool.
the elephant in the room
just check the email-field of any recent SQL dump, to verify that.
731,308,683 Unique Documents Indexed
of course 42 people used the zip code for the State of Michigan Department of Treasury'.
Related Courses
Data Base Management SystemIndian Institute of Technology, Kharagpur via Swayam Healthcare Data Models
University of California, Davis via Coursera Image Data Augmentation with Keras
Coursera Project Network via Coursera Compare time series predictions of COVID-19 deaths
Coursera Project Network via Coursera Practicing Machine Learning Interview Questions in R
DataCamp