How to Build Custom Datasets for Text in PyTorch

Offered By: Aladdin Persson via YouTube

Course Description

Overview

Learn how to build custom datasets for text processing in PyTorch with this in-depth tutorial video. Explore advanced techniques for handling text data using an image captioning dataset (Flickr8k) as an example. Discover how to implement a PyTorch Dataset for loading Flickr data, set up vocabulary and numericalization, create collate functions for batch padding, and develop a function for obtaining data loaders. Apply these principles to various text-based projects, including translation, question answering, and sentiment analysis. Follow along as the instructor demonstrates the code implementation, troubleshoots errors, and provides valuable insights for working with text data in PyTorch.

Syllabus

- Introduction
- Overview of what we're going to do
- Imports
- Setup of Pytorch Dataset for loading Flickr
- Setup of Vocabulary and Numericalization
- Creating Collate for Padding of Batch
- Function for getting data loader
- Running code & fixing couple of errors
- Ending

Taught by

Aladdin Persson

Related Courses

Text Mining and Analytics
University of Illinois at Urbana-Champaign via Coursera Introduction to Natural Language Processing
University of Michigan via Coursera Enabling Technologies for Data Science and Analytics: The Internet of Things
Columbia University via edX Machine Learning Capstone: An Intelligent Application with Deep Learning
University of Washington via Coursera moocTLH: Nuevos retos en las tecnologías del lenguaje humano
Universidad de Alicante via Miríadax