Azure Databricks Using Python With PySpark
Offered By: Bryan Cafferky via YouTube
Course Description
Overview
Explore Python on Spark with PySpark in Azure Databricks through this comprehensive 52-minute tutorial. Dive into basic concepts and witness extensive demonstrations in a Databricks notebook. Learn about scaleout, DataFrame API, RDD vs DataFrame, PySpark API, and scaling out ML. Follow along with notebook setup, data importing, Python vs SQL comparisons, and creating persistent tables. Master techniques for renaming columns, using Pandas, exploring data, persistence, and visualizations. Delve into case statements, Spark Sequel, Matplotlib, and user-defined functions. Conclude with hands-on experience in building and writing ML models. Access the accompanying notebook on GitHub for a complete learning experience.
Syllabus
Introduction
Background
Scaleout
DataFrame API
RDD vs DataFrame
PySpark API
PySpark
Scaling out ML
Notebook setup
Importing data
Python vs SQL
Creating persistent tables
Renaming columns
Pandas
Display
DropN
Exploring the Data
Persistence
Visualizations
More Data
Case Statements
Spark Sequel
Matplotlib
Adding a new column
Adding a new dataframe
Userdefined functions
Local Python dataframe
ML Live
Building the Model
Writing the Model
Taught by
Bryan Cafferky
Related Courses
Fundamentals of Scalable Data ScienceIBM via Coursera Data Science and Engineering with Spark
Berkeley University of California via edX Master of Machine Learning and Data Science
Imperial College London via Coursera Data Analysis Using Pyspark
Coursera Project Network via Coursera Building Machine Learning Pipelines in PySpark MLlib
Coursera Project Network via Coursera