Lab - Analyze and Prepare Data with Amazon SageMaker Data Wrangler and Amazon EMR
Offered By: Amazon Web Services via AWS Skill Builder
Course Description
Overview
In this lab, you learn how to visualize, prepare data and transform a dataset in SageMaker Data Wangler. You will also use S3 and SageMaker Studio to interact with Apache Hive using Apache Spark.
Objectives
- Understand effective methods for visualizing data
- Explore methods for data cleaning and transformation and how to process missing values, outliers, duplicated data, etc.
- Learn how to ingest and transform data into Amazon Sagemaker Data Wrangler
- Experiment with how to transform data using Spark on Amazon EMR
Prerequisites
- Basic navigation of the AWS Management Console.
- An understanding of database concepts, MySQL, and database availability.
Outline
- Task 1: Import, visualize, and perform a preliminary analysis of the data with SageMaker Data Wrangler
- Task 2: Analyze and visualize the data
- Task 3: Perform data transformations and export the datasets
- Task 4: Set up the environment
- Task 5: Connect to an EMR cluster
- Task 6: Explore and query data from the SparkMagic PySpark kernel
Tags
Related Courses
Deep Dive into Amazon GlacierAmazon via Independent Preparing for your Professional Data Engineer Journey
Google Cloud via Coursera Building Resilient Streaming Systems on Google Cloud Platform en Français
Google Cloud via Coursera IBM AI Enterprise Workflow
IBM via Coursera Introduction to Designing Data Lakes on AWS
Amazon Web Services via edX