R Programming in Data Science: High Volume Data
Offered By: LinkedIn Learning
Course Description
Overview
Analyze high-volume data using R, the language optimized for big data. Learn how to produce visualizations, implement parallel processing, and integrate with SQL and Apache Spark.
Syllabus
Introduction
- Wrangling high-volume data with R
- Sample data set
- Perspectives on high-volume data
- Big data and available memory
- Code: Finding available memory
- Big data and CPU cycles
- Code: How fast is your computer?
- High-volume data and visualizations
- Code: Graphs for high-volume data
- Code: rug() and jitter()
- Code: Applying statistics to plots
- Code: Subsampled graphs for high-volume data
- Code: Trellising data across multiple charts
- R programming tools for high-volume data
- Downsampling
- Profile R code to find inefficiencies
- Code: Profile R code to find inefficiencies
- Avoid the copy-on-modify problem with R
- Code: Avoid copy-on-modify with data.table
- Optimization versus readability
- Compile R functions
- Parallel processing with R
- Code: Parallel R functions
- bigmemory, LaF, and ff packages
- Store high-volume data in a database
- Code: R with databases
- Cloud computing with R
- Sparklyr with R
- Code: R with Sparklyr
- Summary of high-volume data with R
Taught by
Mark Niemann-Ross
Related Courses
Address Business Issues with Data ScienceCertNexus via Coursera Advanced Clinical Data Science
University of Colorado System via Coursera Advanced Data Science Capstone
IBM via Coursera Advanced Data Science with IBM
IBM via Coursera Advanced Deep Learning Methods for Healthcare
University of Illinois at Urbana-Champaign via Coursera