Introduction to Apache Spark and AWS
Offered By: University of London International Programmes via Coursera
Course Description
Overview
Learn to analyze big data using Apache Spark's distributed computing framework.
In a series of focused, practical tasks, you will start by launching a spark cluster on Amazon's EC2 cloud computing platform. As you progress to working with real data, you will gain exposure to a variety of useful tools, including RDFlib and SPARQL.
The practical tasks on this course make use of the Gutenberg Project data - the world's largest open collection of ebooks. This offers no end of opportunity for highly engaging and novel analyses.
As the taught material and example code is given in Python, it is strongly recommended that all students have previous Python programming experience. Furthermore, launching and interacting with a cluster on EC2 requires basic knowledge of Unix command line, and some experience with a command-line editor such as vim or nano would also be advantageous.
With these minimal prerequisites, this course is designed to get you up and running in Spark as quickly and painlessly as possible, so that by the end, you will be comfortable and competent enough to start engineering your own big data solutions.
In a series of focused, practical tasks, you will start by launching a spark cluster on Amazon's EC2 cloud computing platform. As you progress to working with real data, you will gain exposure to a variety of useful tools, including RDFlib and SPARQL.
The practical tasks on this course make use of the Gutenberg Project data - the world's largest open collection of ebooks. This offers no end of opportunity for highly engaging and novel analyses.
As the taught material and example code is given in Python, it is strongly recommended that all students have previous Python programming experience. Furthermore, launching and interacting with a cluster on EC2 requires basic knowledge of Unix command line, and some experience with a command-line editor such as vim or nano would also be advantageous.
With these minimal prerequisites, this course is designed to get you up and running in Spark as quickly and painlessly as possible, so that by the end, you will be comfortable and competent enough to start engineering your own big data solutions.
Taught by
Dr Sorrel Harriet and Christophe Rhodes
Tags
Related Courses
Blockchain Scalability and its Foundations in Distributed SystemsThe University of Sydney via Coursera Cloud Computing Concepts, Part 1
University of Illinois at Urbana-Champaign via Coursera Cloud Systems Software
Georgia Institute of Technology via Coursera Cloud Computing Concepts: Part 2
University of Illinois at Urbana-Champaign via Coursera Using GPUs to Scale and Speed-up Deep Learning
IBM via edX