SEA-LION - Representing Diverse Southeast Asian Languages with Large Language Models
Offered By: Databricks via YouTube
Course Description
Overview
Explore the development of SEA-LION, an open-source large language model designed to represent the diverse languages and cultural contexts of Southeast Asia, in this 36-minute conference talk. Discover how AI Singapore collaborated with Databricks MosaicML to create a localized LLM capable of handling multiple languages, including Thai, Indonesian, and Tamil, as well as unique linguistic phenomena like code-switching between dialects. Learn about the design considerations, from customizing tokenizers for regional languages to ensuring cost-effectiveness for resource-constrained organizations. Gain insights into potential applications and the long-term vision for this innovative model that aims to bridge the gap in language representation for Southeast Asian communities.
Syllabus
SEA-LION: Representing the Diverse Languages of Southeast Asia with LLMs
Taught by
Databricks
Related Courses
Data Processing with AzureLearnQuest via Coursera Mejores prácticas para el procesamiento de datos en Big Data
Coursera Project Network via Coursera Data Science with Databricks for Data Analysts
Databricks via Coursera Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera Curso Completo de Spark con Databricks (Big Data)
Coursera Project Network via Coursera