YoVDO

Big Data Analytics with Hadoop and Apache Spark

Offered By: LinkedIn Learning

Tags

Apache Spark Courses Data Extraction Courses Data Modeling Courses Big Data Analytics Courses Distributed Computing Courses HDFS Courses Data Ingestion Courses

Course Description

Overview

Discover how to build scalable and optimized data analytics pipelines by combining the powers of Apache Hadoop and Spark.

Syllabus

Introduction
  • The combined power of Spark and Hadoop Distributed File System (HDFS)
1. Introduction and Setup
  • Apache Hadoop overview
  • Apache Spark overview
  • Integrating Hadoop and Spark
  • Setting up the environment
  • Using exercise files
2. HDFS Data Modeling for Analytics
  • Storage formats
  • Compression
  • Partitioning
  • Bucketing
  • Best practices for data storage
3. Data Ingestion with Spark
  • Reading external files into Spark
  • Writing to HDFS
  • Parallel writes with partitioning
  • Parallel writes with bucketing
  • Best practices for ingestion
4. Data Extraction with Spark
  • How Spark works
  • Reading HDFS files with schema
  • Reading partitioned data
  • Reading bucketed data
  • Best practices for data extraction
5. Optimizing Spark Processing
  • Pushing down projections
  • Pushing down filters
  • Managing partitions
  • Managing shuffling
  • Improving joins
  • Storing intermediate results
  • Best practices for data processing
6. Use Case Project
  • Problem definition
  • Data loading
  • Total score analytics
  • Average score analytics
  • Top student analytics
Conclusion
  • Next steps

Taught by

Kumaran Ponnambalam

Related Courses

Cloud Computing Concepts, Part 1
University of Illinois at Urbana-Champaign via Coursera
Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera
Reliable Distributed Algorithms - Part 1
KTH Royal Institute of Technology via edX
Introduction to Apache Spark and AWS
University of London International Programmes via Coursera
Réalisez des calculs distribués sur des données massives
CentraleSupélec via OpenClassrooms