YoVDO

Beginning Data Exploration and Analysis with Apache Spark

Offered By: Pluralsight

Tags

Big Data Courses Data Analysis Courses Apache Spark Courses Data Cleaning Courses Functional Programming Courses Data Summarization Courses Data Transformation Courses Data Preparation Courses Data Exploration Courses RDDs Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
80% of a data scientist's job is data preparation. This course is all about data preparation i.e. cleaning, transforming, summarizing data using Spark.

Data preparation is a staple task for any data professional, whether you just want to explore data or develop sophisticated Machine Learning models. Spark is an engine that helps do this in a very intuitive way, using functional constructs that abstract the user from all the messiness of working with large datasets. In this course, Beginning Data Exploration and Analysis with Apache Spark, you'll go through exploratory data analysis and data munging with Spark, step-by-step. First, you'll explore RDDs and functional constructs that make processing in Spark extremely intuitive. Next, you'll discover how to transform and clean unstructured data. Finally, you'll learn how to summarize data along dimensions and how to model relationships to build co-occurrence networks. By the end of this course, you'll be able to use Spark to transform data in any way that you would like.

Syllabus

  • Course Overview 1min
  • Getting Started with Spark's Resilient Distributed Datasets 27mins
  • Transforming and Cleaning Unstructured Data 32mins
  • Summarizing Data Along Dimensions 30mins
  • Modeling Relationships in the Marvel Social Universe 25mins

Taught by

Swetha Kolalapudi

Related Courses

Social Network Analysis
University of Michigan via Coursera
Intro to Algorithms
Udacity
Data Analysis
Johns Hopkins University via Coursera
Computing for Data Analysis
Johns Hopkins University via Coursera
Health in Numbers: Quantitative Methods in Clinical & Public Health Research
Harvard University via edX