Cloud Hadoop: Scaling Apache Spark
Offered By: LinkedIn Learning
Course Description
Overview
Generate genuine business insights from big data. Learn to implement Apache Hadoop and Spark workflows on AWS.
Syllabus
Introduction
- Scaling Apache Hadoop and Spark
- What you should know
- Using cloud services
- Modern Hadoop and Spark
- File systems used with Hadoop and Spark
- Apache or commercial Hadoop distros
- Hadoop and Spark libraries
- Hadoop on Google Cloud Platform
- Spark Job on Google Cloud Platform
- Sign up for Databricks Community Edition
- Add Hadoop libraries
- Databricks AWS Community Edition
- Load data into tables
- Hadoop and Spark cluster on AWS EMR
- Run Spark job on AWS EMR
- Review batch architecture for ETL on AWS
- Apache Spark libraries
- Spark data interfaces
- Select your programming language
- Spark session objects
- Spark shell
- Tour the Databricks Environment
- Tour the notebook
- Import and export notebooks
- Calculate Pi on Spark
- Run WordCount of Spark with Scala
- Import data
- Transformations and actions
- Caching and the DAG
- Architecture: Streaming for prediction
- Spark SQL
- SparkR
- Spark ML: Preparing data
- Spark ML: Building the model
- Spark ML: Evaluating the model
- Advanced machine learning on Spark
- MXNet
- Spark with ADAM for genomics
- Spark architecture for genomics
- Reexamine streaming pipelines
- Spark Streaming
- Streaming ingest services
- Advanced Spark Streaming with MLeap
- Scale Spark on the cloud by example
- Build a quick start with Databricks AWS
- Scale Spark cloud compute with VMs
- Optimize cloud Spark virtual machines
- Use AWS EKS containers and data lake
- Optimize Spark cloud data tiers on Kubernetes
- Build reproducible cloud infrastructure
- Scale on GCP Dataproc or on Terra.bio
- Continue learning for scaling
Taught by
Lynn Langit
Related Courses
Communicating Data Science ResultsUniversity of Washington via Coursera Cloud Computing Applications, Part 2: Big Data and Applications in the Cloud
University of Illinois at Urbana-Champaign via Coursera Cloud Computing Infrastructure
University System of Maryland via edX Google Cloud Platform for AWS Professionals
Google via Coursera Introduction to Apache Spark and AWS
University of London International Programmes via Coursera