Learning Hadoop
Offered By: LinkedIn Learning
Course Description
Overview
Learn all the essentials of Hadoop, a key tool for processing and understanding big data.
Syllabus
Introduction
- What and why Hadoop?
- What you should know
- Use cloud services
- What is Hadoop?
- Review Hadoop distributions and cloud services
- Set up GCP Dataproc Metastore and VM cluster
- Verify GCP Dataproc VM cluster
- Understand Hadoop components
- Understand Java virtual machines (JVMs)
- Explore Hadoop file systems: HDFS
- Explore Hadoop file systems: AWS S3
- Review Hadoop cluster components
- Review test jobs
- Review job output
- Verify Hadoop web interfaces in your test environment
- Verify Hadoop Spark web interfaces in your test environment
- Use the Jupyter interface for Hadoop
- What is MapReduce?
- What is MapReduce word count?
- Review MapReduce word count job
- Prepare for MapReduce Java coding
- Review MapReduce WordCount job code
- Tune by physical methods
- Tune a Mapper
- Understanding data types
- Tune a Reducer
- Use MR 2.0 and 3.0
- Review MR optimization examples
- Migrate to Cloud Hadoop
- Scale VM-based Clusters
- Use autoscale policies
- Scale Kubernetes Spark clusters
- Understand Hive and HBase
- Create and query tables with Hive
- Understand Pig
- Run WordCount using Pig
- Review Spark architecture
- Scale a Spark job to calculate Pi
- Learn more about using Hadoop
Taught by
Lynn Langit
Related Courses
Software as a ServiceUniversity of California, Berkeley via Coursera Software Defined Networking
Georgia Institute of Technology via Coursera Pattern-Oriented Software Architectures: Programming Mobile Services for Android Handheld Systems
Vanderbilt University via Coursera Web-Technologien
openHPI Données et services numériques, dans le nuage et ailleurs
Certificat informatique et internet via France Université Numerique