YoVDO

Tokenization in NLP: From Basics to Advanced Techniques

Offered By: Data Science Dojo via YouTube

Tags

Machine Learning Courses Text Analysis Courses Word Embeddings Courses Language Models Courses Positional Encoding Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive live talk on tokenization in Natural Language Processing (NLP), led by Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services. Explore the fundamental processes that enable machines to interpret human language, from basic concepts to advanced techniques. Gain insights into word embeddings, text tokenization, token ID conversion, special context tokens, BytePair Encoding, sliding window data sampling, token embeddings creation, word position encoding, and positional encoding. Learn how tokenization impacts language models, enhances text analysis, and improves training data efficiency. Discover the critical role of tokenization in bridging human communication with artificial intelligence, and understand its implications for the future of technology and machine learning.

Syllabus

Introduction
Understanding Word Embeddings
Tokenizing Text
Converting Tokens into Token IDs
Adding Special Context Tokens
BytePair Encoding
Data Sampling with a Sliding Window
Creating Token Embeddings
Encoding Word Positions
Positional Encoding


Taught by

Data Science Dojo

Related Courses

Microsoft Bot Framework and Conversation as a Platform
Microsoft via edX
Unlocking the Power of OpenAI for Startups - Microsoft for Startups
Microsoft via YouTube
Improving Customer Experiences with Speech to Text and Text to Speech
Microsoft via YouTube
Stanford Seminar - Deep Learning in Speech Recognition
Stanford University via YouTube
Select Topics in Python: Natural Language Processing
Codio via Coursera