YoVDO

PySpark - Data Processing in Python on Top of Apache Spark

Offered By: EuroPython Conference via YouTube

Tags

EuroPython Courses Python Courses Apache Spark Courses PySpark Courses Data Processing Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore PySpark for large-scale data processing in Python using Apache Spark in this 24-minute EuroPython 2015 conference talk. Gain an overview of Resilient Distributed Datasets (RDDs) and the DataFrame API, understanding how PySpark exposes Spark's programming model to Python. Learn about RDDs as immutable, partitioned collections of objects, and how transformations and actions work within the directed acyclic graph (DAG) execution model. Discover the DataFrame API, introduced in Spark 1.3, which simplifies operations on large datasets and supports various data sources. Delve into topics such as cluster computing, fault-tolerant abstractions, and in-memory computations across large clusters. Access additional resources on Spark architecture, analytics, and cluster computing to further enhance your understanding of this powerful data processing tool.

Syllabus

Introduction
RDD
Transformations
MapReduce
Partitions
What is PySpark
How it works
Userdefined functions
Data Source
PySpark Data Format
Prediction Projection
DataFrame
Schema
Summary


Taught by

EuroPython Conference

Related Courses

A Brief History of Data Storage
EuroPython Conference via YouTube
Breaking the Stereotype - Evolution & Persistence of Gender Bias in Tech
EuroPython Conference via YouTube
We Can Get More from Spatial, GIS, and Public Domain Datasets
EuroPython Conference via YouTube
Using NLP to Detect Knots in Protein Structures
EuroPython Conference via YouTube
The Challenges of Doing Infra-As-Code Without "The Cloud"
EuroPython Conference via YouTube