Master Apache Spark - Hands On!
Offered By: Udemy
Course Description
Overview
What you'll learn:
- Utilize the most powerful big data batch and stream processing engine to solve big data problems
- Master the new Spark Java Datasets API to slice and dice big data in an efficient manner
- Build, deploy and run Spark jobs on the cloud and bench mark performance on various hardware configurations
- Optimize spark clusters to work on big data efficiently and understand performance tuning
- Transform structured and semi-structured data using Spark SQL, Dataframes and Datasets
- Implement popular Machine Learning algorithms in Spark such as Linear Regression, Logistic Regression, and K-Means Clustering
LASTUPDATED:November2020
Apache Spark is the next generation batch and stream processing engine.It's been proven to be almost 100 times faster than Hadoop and much much easier to develop distributed big data applications with. It's demand has sky rocketed in recent years and having this technology on your resume is truly a game changer. Over 3000 companies are using Spark in production right now and the list is growing very quickly! Some of the big names include: Oracle, Hortonworks, Cisco, Verizon, Visa, Microsoft, Amazon as well as most of the big world banks and financial institutions!
In this course you'll learn everything you need to know about using Apache Spark in your organization while using their latest and greatest Java Datasets API. Below are some of the things you'll learn:
How to develop Spark Java Applications using Spark SQLDataframes
Understand how the Spark Standalone cluster works behind the scenes
How to use various transformations to slice and dice your data in Spark Java
How to marshall/unmarshall Java domain objects (pojos) while working with Spark Datasets
Master joins, filters, aggregations and ingest data of various sizes and file formats (txt, csv, Json etc.)
Analyze over 18 million real-world comments on Reddit to find the most trending words used
Develop programs using Spark Streaming for streaming stock market index files
Stream network sockets and messages queued on a Kafka cluster
Learn how to develop the most popular machine learning algorithms using Spark MLlib
Covers the most popular algorithms: Linear Regression, Logistic Regression and K-Means Clustering
You'll be developing over 15 practical Spark Java applications crunching through real world data and slicing and dicing it in various ways using several data transformation techniques. This course is especially important for people who would like to be hired as a java developer or data engineer because Spark is a hugely sought after skill. We'll even go over how tosetup a live cluster and configure Spark Jobs to run on the cloud.You'll also learn about the practical implications of performance tuning and scaling out a cluster to work with big data so you'll definitely be learninga ton in this course. This course has a 30 day money back guarantee. You will have access to all of the code used in this course.
Taught by
Imtiaz Ahmad and Job Ready Programmer Inc.
Related Courses
Big DataUniversity of Adelaide via edX Advanced Data Science with IBM
IBM via Coursera Analysing Unstructured Data using MongoDB and PySpark
Coursera Project Network via Coursera Apache Spark for Data Engineering and Machine Learning
IBM via edX Apache Spark (TM) SQL for Data Analysts
Databricks via Coursera