YoVDO

Tokenization in NLP: From Basics to Advanced Techniques

Offered By: Data Science Dojo via YouTube

Tags

Machine Learning Courses Text Analysis Courses Word Embeddings Courses Language Models Courses Positional Encoding Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive live talk on tokenization in Natural Language Processing (NLP), led by Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services. Explore the fundamental processes that enable machines to interpret human language, from basic concepts to advanced techniques. Gain insights into word embeddings, text tokenization, token ID conversion, special context tokens, BytePair Encoding, sliding window data sampling, token embeddings creation, word position encoding, and positional encoding. Learn how tokenization impacts language models, enhances text analysis, and improves training data efficiency. Discover the critical role of tokenization in bridging human communication with artificial intelligence, and understand its implications for the future of technology and machine learning.

Syllabus

Introduction
Understanding Word Embeddings
Tokenizing Text
Converting Tokens into Token IDs
Adding Special Context Tokens
BytePair Encoding
Data Sampling with a Sliding Window
Creating Token Embeddings
Encoding Word Positions
Positional Encoding


Taught by

Data Science Dojo

Related Courses

Sequence Models
DeepLearning.AI via Coursera
Natural Language Processing in TensorFlow
DeepLearning.AI via Coursera
Applied Natural Language Processing
Chennai Mathematical Institute via Swayam
Natural Language Processing
IBM via Udacity
Natural Language Processing with Classification and Vector Spaces
DeepLearning.AI via Coursera