Perform data science with Azure Databricks
Offered By: Microsoft via Microsoft Learn
Course Description
Overview
- Module 1: Describe Azure Databricks
- Understand the Azure Databricks platform
- Create your own Azure Databricks workspace
- Create a notebook inside your home folder in Databricks
- Understand the fundamentals of Apache Spark notebook
- Create, or attach to, a Spark cluster
- Identify the types of tasks well-suited to the unified analytics engine Apache Spark
- Module 2: Spark architecture fundamentals
- Understand the architecture of an Azure Databricks Spark Cluster
- Understand the architecture of a Spark Job
- Module 3: Read and write data in Azure Databricks
- Use Azure Databricks to read multiple file types, both with and without a Schema.
- Combine inputs from files and data stores, such as Azure SQL Database.
- Transform and store that data for advanced analytics.
- Module 4: Work with DataFrames in Azure Databricks
- Use the count() method to count rows in a DataFrame
- Use the display() function to display a DataFrame in the Notebook
- Cache a DataFrame for quicker operations if the data is needed a second time
- Use the limit function to display a small set of rows from a larger DataFrame
- Use select() to select a subset of columns from a DataFrame
- Use distinct() and dropDuplicates to remove duplicate data
- Use drop() to remove columns from a DataFrame
- Module 5: Work with user-defined functions
- Write User-Defined Functions
- Perform ETL operations using User-Defined Functions
- Module 6: Build and query a Delta Lake
- Learn about the key features and use cases of Delta Lake.
- Use Delta Lake to create, append, and upsert tables.
- Perform optimizations in Delta Lake.
- Compare different versions of a Delta table using Time Machine.
- Module 7: Perform machine learning with Azure Databricks
- Perform Machine Learning
- Train a model and create predictions
- Perform exploratory data analysis
- Describe machine learning workflows
- Build and evaluate machine learning models
- Module 8: Train a machine learning model
- Perform featurization of the dataset
- Finish featurization of the dataset
- Understand Regression modeling
- Build and interpret a regression model
- Module 9: Work with MLflow in Azure Databricks
- Use MLflow to track experiments, log metrics, and compare runs
- Work with MLflow to track experiment metrics, parameters, artifacts and models.
- Module 10: Perform model selection with hyperparameter tuning
- Describe Model selection and Hyperparameter Tuning
- Select the optimal model by tuning Hyperparameters
- Module 11: Deep learning with Horovod for distributed training
- Use Horovod to train a deep learning model
- Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
- Work with Horovod and Petastorm for training a deep learning model
- Module 12: Work with Azure Machine Learning to deploy serving models
- Use Azure Machine Learning to deploy Serving Models
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will:
In this module, you will learn how to:
In this module, you will:
In this module, you will learn how to:
In this module, you will learn how to:
In this module, you will learn how to:
In this module, you will learn how to:
In this module, you will learn how to:
In this module, you will learn how to:
Syllabus
- Module 1: Describe Azure Databricks
- Introduction
- Explain Azure Databricks
- Create an Azure Databricks workspace and cluster
- Understand Azure Databricks Notebooks
- Exercise: Work with Notebooks
- Knowledge check
- Summary
- Module 2: Spark architecture fundamentals
- Introduction
- Understand the architecture of Azure Databricks spark cluster
- Understand the architecture of spark job
- Knowledge check
- Summary
- Module 3: Read and write data in Azure Databricks
- Introduction
- Read data in CSV format
- Read data in JSON format
- Read data in Parquet format
- Read data stored in tables and views
- Write data
- Exercises: Read and write data
- Knowledge check
- Summary
- Module 4: Work with DataFrames in Azure Databricks
- Introduction
- Describe a DataFrame
- Use common DataFrame methods
- Use the display function
- Exercise: Distinct articles
- Knowledge check
- Summary
- Module 5: Work with user-defined functions
- Introduction
- Write user defined functions
- Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
- Knowledge check
- Summary
- Module 6: Build and query a Delta Lake
- Introduction
- Describe the open source Delta Lake
- Exercise: Work with basic Delta Lake functionality
- Describe how Azure Databricks manages Delta Lake
- Exercise: Use the Delta Lake Time Machine and perform optimization
- Knowledge check
- Summary
- Module 7: Perform machine learning with Azure Databricks
- Introduction
- Understand machine learning
- Exercise: Train a model and create predictions
- Understand data using exploratory data analysis
- Exercise: Perform exploratory data analysis
- Describe machine learning workflows
- Exercise: Build and evaluate a baseline machine learning model
- Knowledge check
- Summary
- Module 8: Train a machine learning model
- Introduction
- Perform featurization of the dataset
- Exercise: Finish featurization of the dataset
- Understand regression modeling
- Exercise: Build and interpret a regression model
- Knowledge check
- Summary
- Module 9: Work with MLflow in Azure Databricks
- Introduction
- Use MLflow to track experiments, log metrics, and compare runs
- Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
- Knowledge check
- Summary
- Module 10: Perform model selection with hyperparameter tuning
- Introduction
- Describe model selection and hyperparameter tuning
- Exercise: Select optimal model by tuning hyperparameters
- Knowledge check
- Summary
- Module 11: Deep learning with Horovod for distributed training
- Introduction
- Use Horovod to train a deep learning model
- Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
- Exercise: Work with Horovod and Petastorm for training a deep learning model
- Knowledge check
- Summary
- Module 12: Work with Azure Machine Learning to deploy serving models
- Introduction
- Use Azure Machine Learning to deploy serving models
- Knowledge check
- Summary
Tags
Related Courses
Distributed Computing with Spark SQLUniversity of California, Davis via Coursera Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera Building Your First ETL Pipeline Using Azure Databricks
Pluralsight Implement a data lakehouse analytics solution with Azure Databricks
Microsoft via Microsoft Learn Optimizing Apache Spark on Databricks
Pluralsight