Analyzing Pwned Passwords with Apache Spark
Offered By: GOTO Conferences via YouTube
Course Description
Overview
Explore Apache Spark's capabilities for analyzing large-scale distributed data in this GOTO Chicago 2018 conference talk. Dive into the world of password security as Kelley Robinson, Developer Evangelist at Twilio, demonstrates how to process and analyze over 500 million leaked passwords using Spark. Learn about Spark's API advancements for Scala, Python, and SQL, and discover techniques for efficient data processing. Gain insights into password trends, popular choices, and security implications. Understand the challenges and benefits of working with Spark, including nested error messages and documentation. Discuss data privacy concerns and practical steps for improving password security. Conclude with audience questions and valuable takeaways for implementing Spark in your own projects.
Syllabus
Introduction
What is Twilio
Agenda
What is Spark
We dont need Spark
Data Science Data Engineering
RDD
GroupByKey
DataSets
State of Password
Have I Been Owned
The Data
Schema Check
Most Popular Passwords
Length Column
Run Raw Sequel
Filtering Passwords
Password Data
Schema Inference
UserDefined Functions
Results
Dog Rights
Benefits of Spark
Challenges
Nested Error Messages
Apache Spark Documentation
Security Implications
Data Privacy
Security
What can you do
Thank you
Conclusion
Closing
Audience Questions
Taught by
GOTO Conferences
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera