YoVDO

Perform data science with Azure Databricks

Offered By: Microsoft via Microsoft Learn

Tags

Microsoft Azure Courses Data Science Courses Machine Learning Courses Databricks Courses Model Selection Courses DataFrames Courses User-Defined Functions Courses Model Training Courses Delta Lake Courses MLFlow Courses Azure Databricks Courses

Course Description

Overview

  • Module 1: Describe Azure Databricks
  • In this module, you will:

    • Understand the Azure Databricks platform
    • Create your own Azure Databricks workspace
    • Create a notebook inside your home folder in Databricks
    • Understand the fundamentals of Apache Spark notebook
    • Create, or attach to, a Spark cluster
    • Identify the types of tasks well-suited to the unified analytics engine Apache Spark
  • Module 2: Spark architecture fundamentals
  • In this module, you will:

    • Understand the architecture of an Azure Databricks Spark Cluster
    • Understand the architecture of a Spark Job
  • Module 3: Read and write data in Azure Databricks
  • In this module, you will:

    • Use Azure Databricks to read multiple file types, both with and without a Schema.
    • Combine inputs from files and data stores, such as Azure SQL Database.
    • Transform and store that data for advanced analytics.
  • Module 4: Work with DataFrames in Azure Databricks
  • In this module, you will:

    • Use the count() method to count rows in a DataFrame
    • Use the display() function to display a DataFrame in the Notebook
    • Cache a DataFrame for quicker operations if the data is needed a second time
    • Use the limit function to display a small set of rows from a larger DataFrame
    • Use select() to select a subset of columns from a DataFrame
    • Use distinct() and dropDuplicates to remove duplicate data
    • Use drop() to remove columns from a DataFrame
  • Module 5: Work with user-defined functions
  • In this module, you will learn how to:

    • Write User-Defined Functions
    • Perform ETL operations using User-Defined Functions
  • Module 6: Build and query a Delta Lake
  • In this module, you will:

    • Learn about the key features and use cases of Delta Lake.
    • Use Delta Lake to create, append, and upsert tables.
    • Perform optimizations in Delta Lake.
    • Compare different versions of a Delta table using Time Machine.
  • Module 7: Perform machine learning with Azure Databricks
  • In this module, you will learn how to:

    • Perform Machine Learning
    • Train a model and create predictions
    • Perform exploratory data analysis
    • Describe machine learning workflows
    • Build and evaluate machine learning models
  • Module 8: Train a machine learning model
  • In this module, you will learn how to:

    • Perform featurization of the dataset
    • Finish featurization of the dataset
    • Understand Regression modeling
    • Build and interpret a regression model
  • Module 9: Work with MLflow in Azure Databricks
  • In this module, you will learn how to:

    • Use MLflow to track experiments, log metrics, and compare runs
    • Work with MLflow to track experiment metrics, parameters, artifacts and models.
  • Module 10: Perform model selection with hyperparameter tuning
  • In this module, you will learn how to:

    • Describe Model selection and Hyperparameter Tuning
    • Select the optimal model by tuning Hyperparameters
  • Module 11: Deep learning with Horovod for distributed training
  • In this module, you will learn how to:

    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Work with Horovod and Petastorm for training a deep learning model
  • Module 12: Work with Azure Machine Learning to deploy serving models
  • In this module, you will learn how to:

    • Use Azure Machine Learning to deploy Serving Models

Syllabus

  • Module 1: Describe Azure Databricks
    • Introduction
    • Explain Azure Databricks
    • Create an Azure Databricks workspace and cluster
    • Understand Azure Databricks Notebooks
    • Exercise: Work with Notebooks
    • Knowledge check
    • Summary
  • Module 2: Spark architecture fundamentals
    • Introduction
    • Understand the architecture of Azure Databricks spark cluster
    • Understand the architecture of spark job
    • Knowledge check
    • Summary
  • Module 3: Read and write data in Azure Databricks
    • Introduction
    • Read data in CSV format
    • Read data in JSON format
    • Read data in Parquet format
    • Read data stored in tables and views
    • Write data
    • Exercises: Read and write data
    • Knowledge check
    • Summary
  • Module 4: Work with DataFrames in Azure Databricks
    • Introduction
    • Describe a DataFrame
    • Use common DataFrame methods
    • Use the display function
    • Exercise: Distinct articles
    • Knowledge check
    • Summary
  • Module 5: Work with user-defined functions
    • Introduction
    • Write user defined functions
    • Exercise: Perform Extract, Transform, Load(ETL) operations using user-defined functions
    • Knowledge check
    • Summary
  • Module 6: Build and query a Delta Lake
    • Introduction
    • Describe the open source Delta Lake
    • Exercise: Work with basic Delta Lake functionality
    • Describe how Azure Databricks manages Delta Lake
    • Exercise: Use the Delta Lake Time Machine and perform optimization
    • Knowledge check
    • Summary
  • Module 7: Perform machine learning with Azure Databricks
    • Introduction
    • Understand machine learning
    • Exercise: Train a model and create predictions
    • Understand data using exploratory data analysis
    • Exercise: Perform exploratory data analysis
    • Describe machine learning workflows
    • Exercise: Build and evaluate a baseline machine learning model
    • Knowledge check
    • Summary
  • Module 8: Train a machine learning model
    • Introduction
    • Perform featurization of the dataset
    • Exercise: Finish featurization of the dataset
    • Understand regression modeling
    • Exercise: Build and interpret a regression model
    • Knowledge check
    • Summary
  • Module 9: Work with MLflow in Azure Databricks
    • Introduction
    • Use MLflow to track experiments, log metrics, and compare runs
    • Exercise: Work with MLflow to track experiment metrics, parameters, artifacts and models
    • Knowledge check
    • Summary
  • Module 10: Perform model selection with hyperparameter tuning
    • Introduction
    • Describe model selection and hyperparameter tuning
    • Exercise: Select optimal model by tuning hyperparameters
    • Knowledge check
    • Summary
  • Module 11: Deep learning with Horovod for distributed training
    • Introduction
    • Use Horovod to train a deep learning model
    • Use Petastorm to read datasets in Apache Parquet format with Horovod for distributed model training
    • Exercise: Work with Horovod and Petastorm for training a deep learning model
    • Knowledge check
    • Summary
  • Module 12: Work with Azure Machine Learning to deploy serving models
    • Introduction
    • Use Azure Machine Learning to deploy serving models
    • Knowledge check
    • Summary

Tags

Related Courses

Distributed Computing with Spark SQL
University of California, Davis via Coursera
Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera
Building Your First ETL Pipeline Using Azure Databricks
Pluralsight
Implement a data lakehouse analytics solution with Azure Databricks
Microsoft via Microsoft Learn
Optimizing Apache Spark on Databricks
Pluralsight