YoVDO

Introduction to Text Mining with R

Offered By: Higher School of Economics via Coursera

Tags

Text Mining Courses Data Analysis Courses Machine Learning Courses Deep Learning Courses R Programming Courses Regular Expressions Courses Topic Modeling Courses Text Classification Courses

Course Description

Overview

In this online course, you will learn about the next big thing in applied analytics – text analysis. This course is self-contained: you will learn everything from basic programming skills to advanced natural language modelling for topic discovery. This course is designed around a problem-oriented approach, meaning that we will not spend too much time learning theoretical concepts but instead focus on applying them to practical problems.
a. The goal of this online course is to equip students with the necessary knowledge and skills for analysing text data with R programming language.
b. We do not assume any specific prerequisites for this course. However, some knowledge of natural language processing or R programming might ease the dive into the course materials.
c. Each week on the course is accompanied by tests, gradable and non-gradable programming assignments, and links to additional material for those who want to dig deeper into the course material. At the end of the course, you’ll have to complete a project and then review your peers' projects.
d. R (programming language), RStudio
e. This course is heavily tilted toward practical skills. During this course, students will dive into the basics of R for text analysis, tidy text approach, regular expressions, different algorithms for topic modelling and text classification with machine learning and deep learning approaches, and many more. Various synthetic and real-world databases will help participants see how to apply these techniques to extract insights from user reviews, social media posts, short descriptions of the products. This distance learning opportunity is brought to you by HSE University, one of the top think tanks in Russia, by instructors experienced in using text analysis for business-oriented projects.
The online course consists on short pre-recorded lectures, 5 to 15 minutes in length.
Each week will have a graded test with 10 to 15 questions. At the end of the last week, students will have to complete a project utilising the skills learned in the course, and then review and grade the projects of their peers.
The course gives students an opportunity to learn the methods on natural language processing (NLP) and then apply these methods to problems in students’ own areas of interest.

Syllabus

  • R and RStudio Basics
    • In this module, you will learn how to work with R and RStudio, how to use RMarkdown for literate programming, and how to work with data using basic R data types and structures
  • Working with Tidyverse
    • In this module, you will learn how to work with data using the Tidyverse set of packages. You will learn how to use tibbles (a Tidyverse alternative to data.frames), the pipe operator from the magrittr package, and how to clean and transform data using the powerful dplyr package. You will also learn how to efficiently work with strings using the stringr package.
  • Supervised machine learning with the bag-of-words approach
    • In this module, you will learn how to obtain text data from Project Gutenberg, how to prepare text data for analysis. You will also learn how to use TF-IDF to find most distinctive words in a corpus of texts and how to build, interpret and evaluate supervised learning models for textual data.
  • Unsupervised machine learning
    • Is this module, you will learn how to preprocess text data using the preText package that can compare many types of preprocessing for a particular corpus. You will also learn how train, interpret and compare topic models.
  • Final Project
    • This module in its entirety is dedicated to the final project of the course, in which you will apply all the knowledge you've gained in this course to do a real analysis of real texts all on your own. You will have to download data from the Project Gutenberg database, explore it, and then apply both supervised and unsupervised machine learning techniques. You will then have to review and grade the work of your peers.

Taught by

Alexander Byzov

Tags

Related Courses

Artificial Intelligence in Social Media Analytics
Johns Hopkins University via Coursera
Introduction to Natural Language Processing in R
DataCamp
Introduction to Text Analysis in R
DataCamp
Topic Modeling in R
DataCamp
CCAI Insights
Google via Google Cloud Skills Boost