YoVDO

SEA-LION - Representing Diverse Southeast Asian Languages with Large Language Models

Offered By: Databricks via YouTube

Tags

Databricks Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the development of SEA-LION, an open-source large language model designed to represent the diverse languages and cultural contexts of Southeast Asia, in this 36-minute conference talk. Discover how AI Singapore collaborated with Databricks MosaicML to create a localized LLM capable of handling multiple languages, including Thai, Indonesian, and Tamil, as well as unique linguistic phenomena like code-switching between dialects. Learn about the design considerations, from customizing tokenizers for regional languages to ensuring cost-effectiveness for resource-constrained organizations. Gain insights into potential applications and the long-term vision for this innovative model that aims to bridge the gap in language representation for Southeast Asian communities.

Syllabus

SEA-LION: Representing the Diverse Languages of Southeast Asia with LLMs


Taught by

Databricks

Related Courses

Data Processing with Azure
LearnQuest via Coursera
Mejores prácticas para el procesamiento de datos en Big Data
Coursera Project Network via Coursera
Data Science with Databricks for Data Analysts
Databricks via Coursera
Azure Data Engineer con Databricks y Azure Data Factory
Coursera Project Network via Coursera
Curso Completo de Spark con Databricks (Big Data)
Coursera Project Network via Coursera