Building Features from Text Data
Offered By: Pluralsight
Course Description
Overview
This course covers aspects of extracting information from text documents and constructing classification models including feature vectorization, locality-sensitive hashing, stopword removal, lemmatization, and more from natural language processing.
From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form. In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models. First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document. Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging. Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together. You will round out the course by implementing a classification model on text documents using many of these modeling abstractions. When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.
From chatbots to machine-generated literature, some of the hottest applications of ML and AI these days are for data in textual form. In this course, Building Features from Text Data, you will gain the ability to structure textual data in a manner ideal for use in ML models. First, you will learn how to represent documents as feature vectors using one-hot encoding, frequency-based, and prediction-based techniques. You will see how to improve these representations based on the meaning, or semantics, of the document. Next, you will discover how to leverage various language modeling features such as stopword removal, frequency filtering, stemming and lemmatization, and parts-of-speech tagging. Finally, you will see how locality-sensitive hashing can be used to reduce the dimensionality of documents while still keeping similar documents close together. You will round out the course by implementing a classification model on text documents using many of these modeling abstractions. When you’re finished with this course, you will have the skills and knowledge to use documents and textual data in conceptually and practically sound ways and represent such data for use in machine learning models.
Taught by
Janani Ravi
Related Courses
Analyzing Squid Game Script with Google Cloud NLPCoursera Project Network via Coursera Basic Sentiment Analysis with TensorFlow
Coursera Project Network via Coursera Build a Text Classification Model with AWS Glue and Amazon SageMaker (Simplified Chinese)
Amazon Web Services via AWS Skill Builder Build NLP pipelines using scikit-learn
Coursera Project Network via Coursera Convolutions for Text Classification with Keras
Coursera Project Network via Coursera