YoVDO

Setup Big Data Development Environment for Spark and Hadoop

Offered By: Udemy

Tags

Hadoop Courses

Course Description

Overview

Setup Big Data Development Environment to develop Spark and Hadoop Applications using EMR and Databricks

What you'll learn:
  • Understand how to setup development environment to learn big data technologies.
  • Setup EMR Cluster using AWS for Development
  • Connect AWS EMR Cluster and validate AWS CLI and Spark CLIs such as spark-shell, pyspark and spark-sql
  • Setup Visual Studio Code and install Remote Development Extension Pack
  • Setup Project Workspace using Visual Studio Code leveraging AWS EMR Cluster
  • Understand Spark Application Development and Deployment Life Cycle using Spark on AWS EMR

One of the key aspects to work on Big Data projects using technologies such as Spark and Hadoop is to have an appropriate development environment. By the end of the course, one will have the development environment ready to build Spark-based applications leveraging the power of multi-node clusters such as EMR, Databricks, etc.

Even though interactive CLIs are effective in learning, they are not good enough for the collaborative development of Spark Applications. Here is what you will be doing to set up an Environment for Application Development using Big Data Technologies such as Hadoop and Spark.

  • Overview of IDEs or Integrated Development Environment Tools such as VS Code, Pycharm, etc.

  • Setup Visual Studio Code on Windows or Mac along with Remote Development Extension Pack

  • Setup Multi-Node Big Data Cluster using AWSElastic Map Reduce aka AWS EMR.

  • Validate Connectivity to Master Node of AWS EMRCluster

  • Setup Workspace on Master Node of AWSEMR Cluster using Visual Studio Code Remote Development Extension Pack.

  • Understand Application Development Life Cycle using Spark.

  • Validate the Application locally using spark-submit command.

  • Setup Required Data Sets in AWSs3

  • Build the Spark Application Bundle as a zip file and deploy using both clients as well as cluster mode.

  • Run Spark Application using CLIon Master Node of the cluster.

  • Deploy the Spark Application as Step using EMR Cluster


Taught by

Durga Viswanatha Raju Gadiraju and Asasri Manthena

Related Courses

Intro to Hadoop and MapReduce
Cloudera via Udacity
Processing Big Data with Hadoop in Azure HDInsight
Microsoft via edX
Implementing Real-Time Analytics with Hadoop in Azure HDInsight
Microsoft via edX
Hadoop Platform and Application Framework
University of California, San Diego via Coursera
Data Manipulation at Scale: Systems and Algorithms
University of Washington via Coursera