Building Open Source Software (OSS) Analytics Solutions with Azure HDInsight
Offered By: Microsoft via Microsoft Learn
Course Description
Overview
- Module 1: Introduction to the Open source Analytics Offering
- What HDInsight is
- How HDInsight works
- When to use HDInsight
- Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
- The correct HDInsight configuration options
- Decision criteria for selecting the correct HDInsight configuration option
- Analyze a scenario and map it to an HDInsight configuration option
- Cost Optimization strategies for HDInsight clusters
- Module 3: Creating and configuring a HDInsight cluster
- Create an HDInsight Spark Cluster
- Execute queries on an HDInsight Spark Cluster
- Monitor an HDInsight Spark Cluster
- Learn how to fix common provisioning issues
- Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
- Introduction
- Use HDInsight HBase clusters
- Describe HBase Architecture Patterns
- Exercise - Provisioning a HDInsight HBase cluster
- Exercise – Run benchmarks in HBase
- Understand HBase Best Practices
- Summary
- Knowledge Check
- Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
- When to use Apache Spark and Kafka with HDInsight
- How Spark Structured Streaming works
- The architecture of a Kafka and Spark solution
- How to provision HDInsight, create a Kafka producer, and stream Kafka data to a Jupyter notebook
- How to replicate data to a secondary cluster
- Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
- Appropriate scenarios to deploy HDInsight Interactive Query clusters
- Learn about architectural patterns
- Deploy a cluster for your real-estate app and query the data
- Learn how to integrate Apache Spark and Hive LLAP queries using the Hive Warehouse Connector
- Create a large-scale interactive query dashboard to evaluate real estate values and locations
- Module 7: Manage enterprise security in HDInsight
- Introduction
- Describe HDInsight security areas
- Implement Network Security
- Understand Operating system security
- Manage Application/ Middleware security
- Implement Data Access security
- Knowledge Check
- Summary
At the end of this module, you will understand:
At the end of this module, you will understand:
In this module you will:
At the end of this module you will understand:
In this module you will learn the following:
Syllabus
- Module 1: Introduction to the Open source Analytics Offering
- Introduction
- What is HDInsight?
- How does HDInsight work
- When to use HDInsight
- Knowledge check
- Summary
- Module 2: Choose the correct HDInsight Configuration to build open source analytics solutions
- Introduction
- HDInsight configuration options
- Decision criteria for selecting the correct HDInsight configuration option
- Analyze a scenario and map it to a HDInsight configuration option
- Cost optimization strategies for HDinsight clusters
- Knowledge check
- Summary
- Module 3: Creating and configuring a HDInsight cluster
- Introduction
- Creating an HDInsight cluster
- Exercise - Create an HDInsight cluster via the Azure portal
- Opening a Jupyter Notebook on HDInsight Spark cluster
- Exercise - Execute queries on HDInsight Spark cluster
- Enable monitoring of HDInsight jobs
- Common provisioning Issues
- Exercise - Monitor an HDInsight cluster
- Summary
- Knowledge check
- Module 4: Run Petabyte level OSS NoSQL databases with HDInsight HBase
- Introduction
- Describe Apache HBase
- Explain HDInsight HBase clusters architecture and application patterns
- Improve the write and read performance of HBase clusters
- Determine migration and high availability strategies in HDInsight HBase
- Use Apache Phoenix on HDInsight HBase
- Determine HDInsight HBase cluster performance
- Perform benchmarking in HBase
- Knowledge check
- Summary
- Module 5: Perform advanced streaming data transformations with Apache Spark and Kafka in Azure HDInsight
- Introduction
- Use HDInsight Spark and Kafka
- Stream data with Apache Kafka
- Describe Spark structured streaming
- Create a Kafka and Spark architecture
- Exercise - Provision HDInsight to perform advanced streaming data transformations
- Exercise - Create the Kafka producer
- Exercise - Stream Kafka data to a Jupyter notebook and window the data
- Replicate data to a secondary cluster
- Knowledge check
- Summary
- Module 6: Perform Zero ETL analytics with HDInsight Interactive Query
- Introduction
- When should you use HDInsight Interactive Query
- HDInsight interactive queries
- Exercise - Provision HDInsight to perform adhoc analytics
- Exercise - Upload and query data in HDInsight
- Integrate Apache Spark and Hive LLAP queries
- Create a large scale interactive query dashboard for Evaluating Real Estate Trends
- Summary
- Knowledge check
- Module 7: Manage enterprise security in HDInsight
- Introduction
- Describe HDInsight security areas
- Implement Network security
- Understand operating system security
- Manage application/ middleware security
- Implement data access security
- Knowledge check
- Summary
Tags
Related Courses
A Hands-On Look at Amazon Q Business ExpertAmazon Web Services via AWS Skill Builder À la découverte des télécommunications
Institut Mines-Télécom via France Université Numerique A Tour of Google Cloud Sustainability
Google via Google Cloud Skills Boost Intel® Telco Cloud Academy
Intel via Coursera Accéder à Internet depuis Lambda dans un VPC (Français) | Accessing the Internet from Lambda in a VPC (French)
Amazon Web Services via AWS Skill Builder