YoVDO

Tokenization in NLP: From Basics to Advanced Techniques

Offered By: Data Science Dojo via YouTube

Tags

Machine Learning Courses Text Analysis Courses Word Embeddings Courses Language Models Courses Positional Encoding Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive live talk on tokenization in Natural Language Processing (NLP), led by Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services. Explore the fundamental processes that enable machines to interpret human language, from basic concepts to advanced techniques. Gain insights into word embeddings, text tokenization, token ID conversion, special context tokens, BytePair Encoding, sliding window data sampling, token embeddings creation, word position encoding, and positional encoding. Learn how tokenization impacts language models, enhances text analysis, and improves training data efficiency. Discover the critical role of tokenization in bridging human communication with artificial intelligence, and understand its implications for the future of technology and machine learning.

Syllabus

Introduction
Understanding Word Embeddings
Tokenizing Text
Converting Tokens into Token IDs
Adding Special Context Tokens
BytePair Encoding
Data Sampling with a Sliding Window
Creating Token Embeddings
Encoding Word Positions
Positional Encoding


Taught by

Data Science Dojo

Related Courses

4.0 Shades of Digitalisation for the Chemical and Process Industries
University of Padova via FutureLearn
A Day in the Life of a Data Engineer
Amazon Web Services via AWS Skill Builder
FinTech for Finance and Business Leaders
ACCA via edX
Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera
Accounting Data Analytics
Coursera