Bernice: A Multilingual Pre-trained Encoder for Twitter
Offered By: Center for Language & Speech Processing(CLSP), JHU via YouTube
Course Description
Overview
Explore a groundbreaking multilingual RoBERTa language model called Bernice, designed specifically for Twitter data analysis. Learn about the development of this pre-trained encoder, which was trained from scratch on 2.5 billion tweets across multiple languages. Discover how Bernice outperforms other models adapted to social media data and strong multilingual baselines in various monolingual and multilingual Twitter benchmarks. Gain insights into the unique challenges of processing Twitter's multilingual content and how Bernice addresses the significant differences between Twitter language and other domains commonly used in large language model training.
Syllabus
Bernice: A Multilingual Pre-trained Encoder for Twitter - Alexandra DeLucia - October 2022
Taught by
Center for Language & Speech Processing(CLSP), JHU
Related Courses
Multi-Label Classification on Unhealthy Comments - Finetuning RoBERTa with PyTorch - Coding Tutorialrupert ai via YouTube Hugging Face Transformers - The Basics - Practical Coding Guides - NLP Models (BERT/RoBERTa)
rupert ai via YouTube Programming Language of the Future: AI in Your Native Language
Linux Foundation via YouTube Pre-training and Pre-trained Models in Advanced NLP - Lecture 5
Graham Neubig via YouTube Fine-tuning LLMs Without Maxing Out Your GPU - LoRA for Parameter-Efficient Training
Data Centric via YouTube