YoVDO

Managing Big Data with R and Hadoop

Offered By: Partnership for Advanced Computing in Europe via FutureLearn

Tags

Hadoop Courses Data Science Courses Big Data Courses

Course Description

Overview

You will experience how to use RHadoop tool to manage and analyse big data.

This course will give you access to a virtual environment with installations of Hadoop, R and Rstudio to get hands-on experience with big data management. Several unique examples from statistical learning and related R code for map-reduce operations will be available for testing and learning.

Those with basic knowledge in statistical learning and R will better understand the methods behind and how to run them in parallel using map-reduce functions and Hadoop data storage. At the end of the course you will get access to RHadoop on a supercomputer at University of Ljubljana.

This course is designed for people interested in data science, computational statistics and machine learning and have basic experiences with them. It will be also useful for advanced undergraduate students and first year PhD students in data analysis, statistics or bioinformatics, who wish to understand how to manage big data with Hadoop using R programming language.

We expect that the learners will also have basic experiences with linux and bash and working experiences with R and matrix operations. They should be also capable to download and run virtual machine.

All software needed to actively participate the course is provided within the virtual machine that the followers are supposed to download and run on the local machine. No extra software is needed. You will need a modest local machine with 15GB free disk space and 2GB RAM.


Syllabus

  • Welcome to BIG DATA
    • Welcome to the course!
    • Setting up the software
    • Hands on Linux with RHadoop
  • Working with Hadoop
    • Data Management
    • Using AWK with Hadoop
    • Map and reduce
  • First steps in R and RHadoop
    • Basic data management with R
    • RHadoop
    • Four Big Data examples: basic data operations with RHadoop
    • Summary
  • Statistical learning with RHadoop: clustering
    • Introduction to statistical learning
    • Clustering analysis
    • Summary
  • Statistical learning with RHadoop: regression and classification
    • Linear regression
    • Classification
    • Summary
    • Really big data examples

Related Courses

Intro to Hadoop and MapReduce
Cloudera via Udacity
Processing Big Data with Hadoop in Azure HDInsight
Microsoft via edX
Implementing Real-Time Analytics with Hadoop in Azure HDInsight
Microsoft via edX
Hadoop Platform and Application Framework
University of California, San Diego via Coursera
Data Manipulation at Scale: Systems and Algorithms
University of Washington via Coursera