R Programming in Data Science: High Volume Data

Offered By: LinkedIn Learning

Course Description

Overview

Analyze high-volume data using R, the language optimized for big data. Learn how to produce visualizations, implement parallel processing, and integrate with SQL and Apache Spark.

Syllabus

Introduction

Wrangling high-volume data with R
Sample data set

1. Problems and Opportunities with High-Volume Data

Perspectives on high-volume data
Big data and available memory
Code: Finding available memory
Big data and CPU cycles
Code: How fast is your computer?

2. Visualizing High-Volume Data

High-volume data and visualizations
Code: Graphs for high-volume data
Code: rug() and jitter()
Code: Applying statistics to plots
Code: Subsampled graphs for high-volume data
Code: Trellising data across multiple charts

3. Working within the R Programming Language

R programming tools for high-volume data
Downsampling
Profile R code to find inefficiencies
Code: Profile R code to find inefficiencies
Avoid the copy-on-modify problem with R
Code: Avoid copy-on-modify with data.table
Optimization versus readability

4. Advanced High-Volume Techniques

Compile R functions
Parallel processing with R
Code: Parallel R functions
bigmemory, LaF, and ff packages

5. Use R with External Big Data Solutions

Store high-volume data in a database
Code: R with databases
Cloud computing with R
Sparklyr with R
Code: R with Sparklyr

Conclusion

Summary of high-volume data with R

Taught by

Mark Niemann-Ross

R Programming in Data Science: High Volume Data

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

R Programming in Data Science: High Volume Data

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue