Learning Hadoop
Offered By: LinkedIn Learning
Course Description
Overview
Learn all the essentials of Hadoop, a key tool for processing and understanding big data.
Syllabus
Introduction
- What and why Hadoop?
- What you should know
- Use cloud services
- What is Hadoop?
- Review Hadoop distributions and cloud services
- Set up GCP Dataproc Metastore and VM cluster
- Verify GCP Dataproc VM cluster
- Understand Hadoop components
- Understand Java virtual machines (JVMs)
- Explore Hadoop file systems: HDFS
- Explore Hadoop file systems: AWS S3
- Review Hadoop cluster components
- Review test jobs
- Review job output
- Verify Hadoop web interfaces in your test environment
- Verify Hadoop Spark web interfaces in your test environment
- Use the Jupyter interface for Hadoop
- What is MapReduce?
- What is MapReduce word count?
- Review MapReduce word count job
- Prepare for MapReduce Java coding
- Review MapReduce WordCount job code
- Tune by physical methods
- Tune a Mapper
- Understanding data types
- Tune a Reducer
- Use MR 2.0 and 3.0
- Review MR optimization examples
- Migrate to Cloud Hadoop
- Scale VM-based Clusters
- Use autoscale policies
- Scale Kubernetes Spark clusters
- Understand Hive and HBase
- Create and query tables with Hive
- Understand Pig
- Run WordCount using Pig
- Review Spark architecture
- Scale a Spark job to calculate Pi
- Learn more about using Hadoop
Taught by
Lynn Langit
Related Courses
Intro to Hadoop and MapReduceCloudera via Udacity Processing Big Data with Hadoop in Azure HDInsight
Microsoft via edX Implementing Real-Time Analytics with Hadoop in Azure HDInsight
Microsoft via edX Hadoop Platform and Application Framework
University of California, San Diego via Coursera Data Manipulation at Scale: Systems and Algorithms
University of Washington via Coursera