YoVDO

The Ultimate Hands-On Hadoop: Tame your Big Data!

Offered By: Skillshare

Tags

Hadoop Courses Big Data Courses Data Analysis Courses Database Management Courses Spark Streaming Courses

Course Description

Overview

Learn and master the most popular big data technologies in this comprehensive course, taught by a former engineer and senior manager from Amazon and IMDb. We'll go way beyond Hadoop itself, and dive into all sorts of distributed systems you may need to integrate with.

  • Install and work with a real Hadoop installation right on your desktop with Hortonworks and the Ambari UI
  • Manage big data on a cluster with HDFS and MapReduce
  • Write programs to analyze data on Hadoop with Pig and Spark
  • Store and query your data with SqoopHiveMySQLHBaseCassandraMongoDBDrillPhoenix, and Presto
  • Design real-world systems using the Hadoop ecosystem
  • Learn how your cluster is managed with YARNMesosZookeeperOozieZeppelin, and Hue
  • Handle streaming data in real time with KafkaFlumeSpark StreamingFlink, and Storm

Syllabus

  • Introduction
  • Install Hadoop on your Desktop
  • Hadoop Overview and History
  • Overview of the Hadoop Ecosystem
  • HDFS: What it is, and how it works
  • [Activity] Install the MovieLens dataset into HDFS using the Ambari UI
  • [Activity] Install the MovieLens dataset into HDFS using the command line
  • MapReduce: What it is, and how it works
  • How MapReduce distributes processing
  • MapReduce example: Break down movie ratings by rating score
  • [Activity] Installing Python, MRJob, and nano
  • [Activity] Code up the ratings histogram MapReduce job and run it
  • [Exercise] Rank movies by their popularity
  • [Activity] Check your results against mine!
  • Introducing Ambari
  • Introducing Pig
  • Example: Find the oldest movie with a 5-star rating using Pig
  • [Activity] Find old 5-star movies with Pig
  • More Pig Latin
  • [Exercise] Find the most-rated one-star movie
  • Pig Challenge: Compare Your Results to Mine!
  • Why Spark?
  • The Resilient Distributed Dataset (RDD)
  • [Activity] Find the movie with the lowest average rating - with RDD's
  • Datasets and Spark 2.0
  • [Activity] Find the movie with the lowest average rating - with DataFrames
  • [Activity] Movie recommendations with MLLib
  • [Exercise] Filter the lowest-rated movies by number of ratings
  • [Activity] Check your results against mine!
  • What is Hive?
  • [Activity] Use Hive to find the most popular movie[Activity] Use Hive to find the most popular movie
  • How Hive works
  • [Exercise] Use Hive to find the movie with the highest average rating
  • Compare your solution to mine.
  • Integrating MySQL with Hadoop
  • [Activity] Install MySQL and import our movie data
  • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
  • [Activity] Use Sqoop to export data from Hadoop to MySQL
  • Why NoSQL?
  • What is HBase
  • [Activity] Import movie ratings into HBase
  • [Activity] Use HBase with Pig to import data at scale.
  • Cassandra overview
  • [Activity] Installing Cassandra
  • [Activity] Write Spark output into Cassandra
  • MongoDB Overview
  • [Activity] Install MongoDB, and integrate Spark with MongoDB
  • [Activity] Using the MongoDB shell
  • Choosing a database technology
  • [Exercise] Choose a database for a given problem
  • Overview of Drill
  • [Activity] Setting Up Drill
  • [Activity] Querying across multiple databases with Drill
  • Overview of Phoenix
  • [Activity] Install Phoenix and query HBase with it
  • [Activity] Integrate Phoenix with Pig
  • Overview of Presto
  • [Activity] Install Presto, and query Hive with it.
  • [Activity] Query both Cassandra and Hive using Presto.
  • YARN explained
  • Tez explained
  • [Activity] Use Hive on Tez and measure the performance benefit
  • Mesos explained
  • ZooKeeper explained
  • [Activity] Simulating a failing master with ZooKeeper
  • Oozie explained
  • [Activity] Set up a simple Oozie workflow
  • Zeppelin overview
  • [Activity] Use Zeppelin to analyze movie ratings, part 1
  • [Activity] Use Zeppelin to analyze movie ratings, part 2
  • Hue overview
  • Other technologies worth mentioning
  • Kafka explained
  • [Activity] Setting up Kafka, and publishing some data.
  • [Activity] Publishing web logs with Kafka
  • Flume explained
  • [Activity] Set Up Flume and publish logs with Spark
  • [Activity] Set up Flume to monitor a directory and store its data in HDFS
  • Spark Streaming: Introduction
  • [Activity] Analyze web logs published with Flume using Spark Streaming
  • [Exercise] Monitor Flume-published logs for errors in real time
  • Exercise solution: Aggregating HTTP access codes with Spark Streaming
  • Apache Storm: Introduction
  • [Activity] Count words with Storm
  • Flink: An Overview
  • [Activity] Counting words with Flink
  • The Best of the Rest
  • Review: How the pieces fit together
  • Understanding your requirements
  • Sample application: consume webserver logs and keep track of top-sellers
  • Sample application: serving movie recommendations to a website
  • [Exercise] Design a system to report web sessions per day
  • Exercise solution: Design a system to count daily sessions

Taught by

Frank Kane

Related Courses

Big Data Computing with Spark
The Hong Kong University of Science and Technology via edX
Advanced Big Data Systems | 高级大数据系统
Tsinghua University via edX
Apache Spark Essential Training
LinkedIn Learning
数据科学 | Data Science
Tsinghua University via edX
Data Streaming
Udacity