YoVDO

Learning Hadoop

Offered By: LinkedIn Learning

Tags

Hadoop Courses Big Data Courses Cloud Computing Courses Apache Spark Courses Dataproc Courses MapReduce Courses HDFS Courses

Course Description

Overview

Learn all the essentials of Hadoop, a key tool for processing and understanding big data.

Syllabus

Introduction
  • What and why Hadoop?
  • What you should know
  • Use cloud services
1. Set Up Cloud Hadoop
  • What is Hadoop?
  • Review Hadoop distributions and cloud services
  • Set up GCP Dataproc Metastore and VM cluster
  • Verify GCP Dataproc VM cluster
2. Understand Hadoop Core Components
  • Understand Hadoop components
  • Understand Java virtual machines (JVMs)
  • Explore Hadoop file systems: HDFS
  • Explore Hadoop file systems: AWS S3
  • Review Hadoop cluster components
3. Set Up and Verify Development Environment
  • Review test jobs
  • Review job output
  • Verify Hadoop web interfaces in your test environment
  • Verify Hadoop Spark web interfaces in your test environment
  • Use the Jupyter interface for Hadoop
4. Understand MapReduce
  • What is MapReduce?
  • What is MapReduce word count?
  • Review MapReduce word count job
  • Prepare for MapReduce Java coding
  • Review MapReduce WordCount job code
5. Tune MapReduce
  • Tune by physical methods
  • Tune a Mapper
  • Understanding data types
  • Tune a Reducer
  • Use MR 2.0 and 3.0
  • Review MR optimization examples
6. Scale Cloud Hadoop
  • Migrate to Cloud Hadoop
  • Scale VM-based Clusters
  • Use autoscale policies
  • Scale Kubernetes Spark clusters
7. Use Hive, Pig, and Spark
  • Understand Hive and HBase
  • Create and query tables with Hive
  • Understand Pig
  • Run WordCount using Pig
  • Review Spark architecture
  • Scale a Spark job to calculate Pi
Conclusion
  • Learn more about using Hadoop

Taught by

Lynn Langit

Related Courses

Intro to Hadoop and MapReduce
Cloudera via Udacity
Processing Big Data with Hadoop in Azure HDInsight
Microsoft via edX
Implementing Real-Time Analytics with Hadoop in Azure HDInsight
Microsoft via edX
Hadoop Platform and Application Framework
University of California, San Diego via Coursera
Data Manipulation at Scale: Systems and Algorithms
University of Washington via Coursera