YoVDO

How to Build a Bert WordPiece Tokenizer in Python and HuggingFace

Offered By: James Briggs via YouTube

Tags

Natural Language Processing (NLP) Courses Machine Learning Courses Transformer Models Courses

Course Description

Overview

Learn how to build a BERT WordPiece tokenizer from scratch using Python and HuggingFace in this comprehensive tutorial video. Explore the process of creating a custom tokenizer for specific use cases, particularly for uncommon languages or less tech-savvy domains. Dive into the intricacies of the WordPiece tokenizer used by BERT, a popular transformer model for various language-based machine learning tasks. Follow along as the instructor guides you through downloading datasets, utilizing HuggingFace's tools, and implementing the tokenizer code. Gain valuable insights into the tokenizer walkthrough and understand how this fundamental step can enhance your natural language processing projects.

Syllabus

Intro
WordPiece Tokenizer
Download Data Sets
HuggingFace
Dataset
Tokenizer
Tokenizer Walkthrough
Tokenizer Code


Taught by

James Briggs

Related Courses

Natural Language Processing
Columbia University via Coursera
Natural Language Processing
Stanford University via Coursera
Introduction to Natural Language Processing
University of Michigan via Coursera
moocTLH: Nuevos retos en las tecnologĂ­as del lenguaje humano
Universidad de Alicante via MirĂ­adax
Natural Language Processing
Indian Institute of Technology, Kharagpur via Swayam