Tokenization in NLP: From Basics to Advanced Techniques
Offered By: Data Science Dojo via YouTube
Course Description
Overview
Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Dive into a comprehensive live talk on tokenization in Natural Language Processing (NLP), led by Suman Debnath, Principal Developer Advocate for Machine Learning at Amazon Web Services. Explore the fundamental processes that enable machines to interpret human language, from basic concepts to advanced techniques. Gain insights into word embeddings, text tokenization, token ID conversion, special context tokens, BytePair Encoding, sliding window data sampling, token embeddings creation, word position encoding, and positional encoding. Learn how tokenization impacts language models, enhances text analysis, and improves training data efficiency. Discover the critical role of tokenization in bridging human communication with artificial intelligence, and understand its implications for the future of technology and machine learning.
Syllabus
Introduction
Understanding Word Embeddings
Tokenizing Text
Converting Tokens into Token IDs
Adding Special Context Tokens
BytePair Encoding
Data Sampling with a Sliding Window
Creating Token Embeddings
Encoding Word Positions
Positional Encoding
Taught by
Data Science Dojo
Related Courses
4.0 Shades of Digitalisation for the Chemical and Process IndustriesUniversity of Padova via FutureLearn A Day in the Life of a Data Engineer
Amazon Web Services via AWS Skill Builder FinTech for Finance and Business Leaders
ACCA via edX Accounting Data Analytics
University of Illinois at Urbana-Champaign via Coursera Accounting Data Analytics
Coursera