Introduction to Big Data with PySpark
Offered By: Codecademy
Course Description
Overview
Learn how to work with big data using PySpark!
This course is an introduction to the underlying concepts behind big data with a practical and hands-on approach with PySpark. Big data is everywhere, and touches data science, data engineering, and machine learning. It is becoming central to marketing, strategy, and research. This course covers the applications and implications of big data on finance, social media, health, and medicine. PySpark makes it easy to start analyzing big data, making the potential of big data accessible to anyone who knows Python.
### Take-Away Skills
In this course, you will learn how to handle big data with PySpark. In addition to learning how to manage the data, you will also be exposed to the conceptual underpinnings that make working with big data possible.
This course is an introduction to the underlying concepts behind big data with a practical and hands-on approach with PySpark. Big data is everywhere, and touches data science, data engineering, and machine learning. It is becoming central to marketing, strategy, and research. This course covers the applications and implications of big data on finance, social media, health, and medicine. PySpark makes it easy to start analyzing big data, making the potential of big data accessible to anyone who knows Python.
### Take-Away Skills
In this course, you will learn how to handle big data with PySpark. In addition to learning how to manage the data, you will also be exposed to the conceptual underpinnings that make working with big data possible.
Syllabus
- Introduction to Big Data: Learn about how we define big data, how big data is stored and processed, and what ethical considerations we need to keep in mind.
- Article: What is Big Data?
- Article: Bias in Data
- Article: Big Data Storage and Computing
- Quiz: Introduction to Big Data
- Spark RDDs with PySpark: Learn one way that Spark handles big data -- through Resilient Distributed Datasets (RDDs).
- Article: What is Spark?
- Lesson: RDDs with PySpark
- Quiz: Introduction to PySpark RDDs
- Spark DataFrames with PySpark SQL: Learn about how PySpark lets you do SQL-like queries on big data datasets.
- Lesson: PySpark SQL
- Project: Analyzing Wikipedia Clickstreams with PySpark
- Quiz: PySpark SQL
- Putting it all together: Combine everything you've learned so far about PySpark to work with a big data dataset!
- Project: Analyze Common Crawl Data with PySpark
Taught by
Nitya Mandyam
Related Courses
Analisar e preparar dados com o Amazon SageMaker Data Wrangler e o Amazon EMR (Português (Brasil)) | Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR (Portuguese (Brazil))Amazon Web Services via AWS Skill Builder Analysing Unstructured Data using MongoDB and PySpark
Coursera Project Network via Coursera Big Data, Hadoop, and Spark Basics
IBM via edX Cleaning and Exploring Big Data using PySpark
Coursera Project Network via Coursera Spark, Hadoop, and Snowflake for Data Engineering
Pragmatic AI Labs via edX