R Programming in Data Science: High Volume Data
Offered By: LinkedIn Learning
Course Description
Overview
Analyze high-volume data using R, the language optimized for big data. Learn how to produce visualizations, implement parallel processing, and integrate with SQL and Apache Spark.
Syllabus
Introduction
- Wrangling high-volume data with R
- Sample data set
- Perspectives on high-volume data
- Big data and available memory
- Code: Finding available memory
- Big data and CPU cycles
- Code: How fast is your computer?
- High-volume data and visualizations
- Code: Graphs for high-volume data
- Code: rug() and jitter()
- Code: Applying statistics to plots
- Code: Subsampled graphs for high-volume data
- Code: Trellising data across multiple charts
- R programming tools for high-volume data
- Downsampling
- Profile R code to find inefficiencies
- Code: Profile R code to find inefficiencies
- Avoid the copy-on-modify problem with R
- Code: Avoid copy-on-modify with data.table
- Optimization versus readability
- Compile R functions
- Parallel processing with R
- Code: Parallel R functions
- bigmemory, LaF, and ff packages
- Store high-volume data in a database
- Code: R with databases
- Cloud computing with R
- Sparklyr with R
- Code: R with Sparklyr
- Summary of high-volume data with R
Taught by
Mark Niemann-Ross
Related Courses
Computation Structures 3: Computer OrganizationMassachusetts Institute of Technology via edX Parallel Computing in R
DataCamp A Crash Course in Unity's Entity Component System
Udemy High-performance Data Warehousing with Amazon Redshift
Pluralsight Productivity for Creators: Systems, Organization & Workflow
Skillshare