Analyze Big Data with Hadoop
Offered By: Amazon Web Services via AWS Skill Builder
Course Description
Overview
Languages Available: Español (Latinoamérica) | Español (España) | Français | Bahasa Indonesia | Italiano | 日本語 | 한국어 | Português (Brasil) | 中文(简体)
In this lab, you will deploy a fully functional Hadoop cluster, ready to analyze log data in just a few minutes. You will start by launching an Amazon EMR cluster and then use a HiveQL script to process sample log data stored in an Amazon S3 bucket. HiveQL is a SQL-like scripting language for data warehousing and analysis. You can then use a similar setup to analyze your own log files.
Level
Fundamental
Duration
1 Hours 0 MinutesCourse Objectives
In this course, you will learn how to:
- Launch a fully functional Hadoop cluster using **Amazon EMR**
- Define the schema and create a table for sample log data stored in Amazon S3
- Analyze the data using a **HiveQL** script and write the results back to Amazon S3
- Download and view the results on your computer
- Connect to the Hive CLI and run **HiveQL** query script to view the results
Intended Audience
This course is intended for:
- Data Engineers
Prerequisites
We recommend that attendees of this course have the following prerequisites:
- IT Experience: Prior experience with Hadoop is recommended, but not required, to complete this lab
- AWS Experience: Basic familiarity with Amazon S3 and Amazon EC2 key pairs is suggested, but not required, to complete this project
Course Outline
- Task 1: Create an Amazon S3 bucket
- Task 2: Launch an Amazon EMR cluster
- Task 3: Process Your Sample Data by Running a Hive Script
- Task 4: View the Results
- Task 5 : Connect to the EMR cluster CLI and perform query using HiveQL
- Task 6: Terminate your Amazon EMR Cluster
Tags
Related Courses
Getting Started with Amazon Simple Storage Service (S3)Amazon via Independent Deep Dive into Amazon Simple Storage Service (Amazon S3)
Amazon via Independent AWS Developer Series
Amazon via edX Crear y gestionar archivos con AWS S3
Coursera Project Network via Coursera Building Data Lakes on AWS
Amazon Web Services via Coursera