YoVDO

Azure Databricks Using Python With PySpark

Offered By: Bryan Cafferky via YouTube

Tags

Microsoft Azure Courses Databricks Courses PySpark Courses

Course Description

Overview

Explore Python on Spark with PySpark in Azure Databricks through this comprehensive 52-minute tutorial. Dive into basic concepts and witness extensive demonstrations in a Databricks notebook. Learn about scaleout, DataFrame API, RDD vs DataFrame, PySpark API, and scaling out ML. Follow along with notebook setup, data importing, Python vs SQL comparisons, and creating persistent tables. Master techniques for renaming columns, using Pandas, exploring data, persistence, and visualizations. Delve into case statements, Spark Sequel, Matplotlib, and user-defined functions. Conclude with hands-on experience in building and writing ML models. Access the accompanying notebook on GitHub for a complete learning experience.

Syllabus

Introduction
Background
Scaleout
DataFrame API
RDD vs DataFrame
PySpark API
PySpark
Scaling out ML
Notebook setup
Importing data
Python vs SQL
Creating persistent tables
Renaming columns
Pandas
Display
DropN
Exploring the Data
Persistence
Visualizations
More Data
Case Statements
Spark Sequel
Matplotlib
Adding a new column
Adding a new dataframe
Userdefined functions
Local Python dataframe
ML Live
Building the Model
Writing the Model


Taught by

Bryan Cafferky

Related Courses

Fundamentals of Scalable Data Science
IBM via Coursera
Data Science and Engineering with Spark
Berkeley University of California via edX
Master of Machine Learning and Data Science
Imperial College London via Coursera
Data Analysis Using Pyspark
Coursera Project Network via Coursera
Building Machine Learning Pipelines in PySpark MLlib
Coursera Project Network via Coursera