Data Engineering and Machine Learning using Spark
Offered By: IBM via Coursera
Course Description
Overview
NOTE: This course is currently replaced with IBM Machine Learning with Apache Spark.
Further your data engineering career with this self-paced course about machine learning with Apache Spark! Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others.
In this short course you'll gain these practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering.
In this course you will learn about data sources, streaming output modes, and supported data destinations. You will gain insights about the advantages of Apache Spark GraphFrames and complete a number of hands-on labs to apply your knowledge.
You will then move on to learning about machine learning using SparkML, the Spark Machine Learning library. You will gain an understanding of both supervised and unsupervised machine learning, classification and regression tasks, as well as clustering.
The course ends with a final project where you will create your own Apache Spark application for performing Extract, Transform, and Load (ETL) processes.
NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or have skills similar to the ones learnt in that course.
Further your data engineering career with this self-paced course about machine learning with Apache Spark! Organizations need skilled, forward-thinking Big Data practitioners who can apply their business and technical skills to unstructured data such as tweets, posts, pictures, audio files, videos, sensor data, and satellite imagery and more to identify behaviors and preferences of prospects, clients, competitors, and others.
In this short course you'll gain these practical skills when you learn how to work with Apache Spark for Data Engineering and Machine Learning (ML) applications. You will work hands-on with Spark MLlib, Spark Structured Streaming, and more to perform extract, transform and load (ETL) tasks as well as Regression, Classification, and Clustering.
In this course you will learn about data sources, streaming output modes, and supported data destinations. You will gain insights about the advantages of Apache Spark GraphFrames and complete a number of hands-on labs to apply your knowledge.
You will then move on to learning about machine learning using SparkML, the Spark Machine Learning library. You will gain an understanding of both supervised and unsupervised machine learning, classification and regression tasks, as well as clustering.
The course ends with a final project where you will create your own Apache Spark application for performing Extract, Transform, and Load (ETL) processes.
NOTE: This course requires that you have foundational skills for working with Apache Spark and Jupyter Notebooks. The Introduction to Big Data with Spark and Hadoop course from IBM will equip you with these skills and it is recommended that you have completed that course or have skills similar to the ones learnt in that course.
Syllabus
- Spark for Data Engineering
- In this first of two modules, learn what streaming data is and get the essential knowledge to use Spark for Structured Streaming. Learn about data sources, streaming output modes, and supported data destinations. Learn about data operations considerations and discover how Spark Structured streaming listeners and checkpointing benefit streaming data processing. Discover how graph theory works with streaming data. You’ll gain insights into the advantages that Apache Spark GraphFrames offers and learn what qualities make data suitable for GraphFrames processing. Then, explore ETL and learn how to use Apache Spark for data extraction, transformation, and loading, put your newfound knowledge to practice, and gain practical, real-world skills in the ETL for Machine Learning Pipelines hands-on lab.
- SparkML
- This module demystifies the concepts and practices related to machine learning using SparkML and the Spark Machine learning library. Explore both supervised and unsupervised machine learning. Explore classification and regression tasks and learn how SparkML supports these machine learning tasks. Gain insights into unsupervised learning, with a focus on clustering, and discover how to apply the k-means clustering algorithm using the Spark MLlib. Complete this learning with the lab that solidifies your learning and gain real-world experience with Spark ML.
- Final Project
- This final project provides real-world experience where you'll create your own Apache Spark application. You will create this Spark application as an end-to-end use-case that follows the Extract, Transform and Load processes (ETL) including data acquisition, transformation, model training, and deployment using IBM Watson Machine Learning.
Taught by
Karthik Muthuraman and Romeo Kienzler
Tags
Related Courses
FinTech for Finance and Business LeadersACCA via edX Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera Advanced AI on Microsoft Azure: Ethics and Laws, Research Methods and Machine Learning
Cloudswyft via FutureLearn Ethics, Laws and Implementing an AI Solution on Microsoft Azure
Cloudswyft via FutureLearn Post Graduate Certificate in Advanced Machine Learning & AI
Indian Institute of Technology Roorkee via Coursera