YoVDO

Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR

Offered By: Amazon Web Services via AWS Skill Builder

Tags

Amazon EMR Courses Data Analysis Courses Data Visualization Courses Data Cleaning Courses Data Transformation Courses Data Ingestion Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

In this lab, you learn how to visualize, prepare data and transform a dataset in SageMaker Data Wangler. You will also use S3 and SageMaker Studio to interact with Apache Hive using Apache Spark.

Objectives

  • Understand effective methods for visualizing data
  • Explore methods for data cleaning and transformation and how to process missing values, outliers, duplicated data, etc.
  • Learn how to ingest and transform data into Amazon Sagemaker Data Wrangler
  • Experiment with how to transform data using Spark on Amazon EMR

Prerequisites

  • Basic navigation of the AWS Management Console.
  • An understanding of database concepts, MySQL, and database availability.

Outline

  • Task 1: Import, visualize, and perform a preliminary analysis of the data with SageMaker Data Wrangler
  • Task 2: Analyze and visualize the data
  • Task 3: Perform data transformations and export the datasets
  • Task 4: Set up the environment
  • Task 5: Connect to an EMR cluster
  • Task 6: Explore and query data from the SparkMagic PySpark kernel

Tags

Related Courses

Data Wrangling with MongoDB
MongoDB via Udacity
Getting and Cleaning Data
Johns Hopkins University via Coursera
软件包在流行病学研究中的应用 Using software apps in epidemiological research
Peking University via Coursera
Creating an Analytical Dataset
Udacity
Implementing ETL with SQL Server Integration Services
Microsoft via edX