The Ultimate Hands-On Hadoop: Tame your Big Data!

Offered By: Skillshare

Course Description

Overview

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
Manage big data on a cluster with HDFS and MapReduce
Write programs to analyze data on Hadoop with Pig and Spark
Store and query your data with Sqoop, Hive, MySQL, HBase, Cassandra, MongoDB, Drill, Phoenix, and Presto
Design real-world systems using the Hadoop ecosystem
Learn how your cluster is managed with YARN, Mesos, Zookeeper, Oozie, Zeppelin, and Hue
Handle streaming data in real time with Kafka, Flume, Spark Streaming, Flink, and Storm

Syllabus

Introduction
Install Hadoop on your Desktop
Hadoop Overview and History
Overview of the Hadoop Ecosystem
HDFS: What it is, and how it works
[Activity] Install the MovieLens dataset into HDFS using the Ambari UI
[Activity] Install the MovieLens dataset into HDFS using the command line
MapReduce: What it is, and how it works
How MapReduce distributes processing
MapReduce example: Break down movie ratings by rating score
[Activity] Installing Python, MRJob, and nano
[Activity] Code up the ratings histogram MapReduce job and run it
[Exercise] Rank movies by their popularity
[Activity] Check your results against mine!
Introducing Ambari
Introducing Pig
Example: Find the oldest movie with a 5-star rating using Pig
[Activity] Find old 5-star movies with Pig
More Pig Latin
[Exercise] Find the most-rated one-star movie
Pig Challenge: Compare Your Results to Mine!
Why Spark?
The Resilient Distributed Dataset (RDD)
[Activity] Find the movie with the lowest average rating - with RDD's
Datasets and Spark 2.0
[Activity] Find the movie with the lowest average rating - with DataFrames
[Activity] Movie recommendations with MLLib
[Exercise] Filter the lowest-rated movies by number of ratings
[Activity] Check your results against mine!
What is Hive?
[Activity] Use Hive to find the most popular movie[Activity] Use Hive to find the most popular movie
How Hive works
[Exercise] Use Hive to find the movie with the highest average rating
Compare your solution to mine.
Integrating MySQL with Hadoop
[Activity] Install MySQL and import our movie data
[Activity] Use Sqoop to import data from MySQL to HFDS/Hive
[Activity] Use Sqoop to export data from Hadoop to MySQL
Why NoSQL?
What is HBase
[Activity] Import movie ratings into HBase
[Activity] Use HBase with Pig to import data at scale.
Cassandra overview
[Activity] Installing Cassandra
[Activity] Write Spark output into Cassandra
MongoDB Overview
[Activity] Install MongoDB, and integrate Spark with MongoDB
[Activity] Using the MongoDB shell
Choosing a database technology
[Exercise] Choose a database for a given problem
Overview of Drill
[Activity] Setting Up Drill
[Activity] Querying across multiple databases with Drill
Overview of Phoenix
[Activity] Install Phoenix and query HBase with it
[Activity] Integrate Phoenix with Pig
Overview of Presto
[Activity] Install Presto, and query Hive with it.
[Activity] Query both Cassandra and Hive using Presto.
YARN explained
Tez explained
[Activity] Use Hive on Tez and measure the performance benefit
Mesos explained
ZooKeeper explained
[Activity] Simulating a failing master with ZooKeeper
Oozie explained
[Activity] Set up a simple Oozie workflow
Zeppelin overview
[Activity] Use Zeppelin to analyze movie ratings, part 1
[Activity] Use Zeppelin to analyze movie ratings, part 2
Hue overview
Other technologies worth mentioning
Kafka explained
[Activity] Setting up Kafka, and publishing some data.
[Activity] Publishing web logs with Kafka
Flume explained
[Activity] Set Up Flume and publish logs with Spark
[Activity] Set up Flume to monitor a directory and store its data in HDFS
Spark Streaming: Introduction
[Activity] Analyze web logs published with Flume using Spark Streaming
[Exercise] Monitor Flume-published logs for errors in real time
Exercise solution: Aggregating HTTP access codes with Spark Streaming
Apache Storm: Introduction
[Activity] Count words with Storm
Flink: An Overview
[Activity] Counting words with Flink
The Best of the Rest
Review: How the pieces fit together
Understanding your requirements
Sample application: consume webserver logs and keep track of top-sellers
Sample application: serving movie recommendations to a website
[Exercise] Design a system to report web sessions per day
Exercise solution: Design a system to count daily sessions

Taught by

Frank Kane

The Ultimate Hands-On Hadoop: Tame your Big Data!

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

The Ultimate Hands-On Hadoop: Tame your Big Data!

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue