Learning Hadoop
Offered By: LinkedIn Learning
Course Description
Overview
Learn all the essentials of Hadoop, a key tool for processing and understanding big data.
Syllabus
Introduction
- What and why Hadoop?
- What you should know
- Use cloud services
- What is Hadoop?
- Review Hadoop distributions and cloud services
- Set up GCP Dataproc Metastore and VM cluster
- Verify GCP Dataproc VM cluster
- Understand Hadoop components
- Understand Java virtual machines (JVMs)
- Explore Hadoop file systems: HDFS
- Explore Hadoop file systems: AWS S3
- Review Hadoop cluster components
- Review test jobs
- Review job output
- Verify Hadoop web interfaces in your test environment
- Verify Hadoop Spark web interfaces in your test environment
- Use the Jupyter interface for Hadoop
- What is MapReduce?
- What is MapReduce word count?
- Review MapReduce word count job
- Prepare for MapReduce Java coding
- Review MapReduce WordCount job code
- Tune by physical methods
- Tune a Mapper
- Understanding data types
- Tune a Reducer
- Use MR 2.0 and 3.0
- Review MR optimization examples
- Migrate to Cloud Hadoop
- Scale VM-based Clusters
- Use autoscale policies
- Scale Kubernetes Spark clusters
- Understand Hive and HBase
- Create and query tables with Hive
- Understand Pig
- Run WordCount using Pig
- Review Spark architecture
- Scale a Spark job to calculate Pi
- Learn more about using Hadoop
Taught by
Lynn Langit
Related Courses
CS115x: Advanced Apache Spark for Data Science and Data EngineeringUniversity of California, Berkeley via edX Big Data Analytics
University of Adelaide via edX Big Data Essentials: HDFS, MapReduce and Spark RDD
Yandex via Coursera Big Data Analysis: Hive, Spark SQL, DataFrames and GraphFrames
Yandex via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera