YoVDO

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Offered By: Montreal Robotics via YouTube

Tags

Robotics Courses Machine Learning Courses Computer Vision Courses Transfer Learning Courses Generalization Courses Vision-Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the groundbreaking research on incorporating vision-language models trained on Internet-scale data into end-to-end robotic control. Delve into the study of how this integration enhances generalization and enables emergent semantic reasoning in robotics. Learn about the novel approach of co-fine-tuning state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks. Discover the innovative technique of expressing robotic actions as text tokens, allowing for seamless integration with natural language responses. Examine the concept of vision-language-action models (VLA) and the specific implementation known as RT-2. Analyze the extensive evaluation results, showcasing improved generalization to novel objects, interpretation of complex commands, and rudimentary reasoning abilities. Explore the potential of chain of thought reasoning in enabling multi-stage semantic reasoning for robotic tasks. Gain insights into the future possibilities of robotic control enhanced by large-scale pretraining on language and vision-language data from the web.

Syllabus

Yevgen Chebotar: RT-2- Vision-Language-Action Models Transfer Web Knowledge to Robotic Control


Taught by

Montreal Robotics

Related Courses

Launching into Machine Learning 日本語版
Google Cloud via Coursera
Launching into Machine Learning auf Deutsch
Google Cloud via Coursera
Launching into Machine Learning en Français
Google Cloud via Coursera
Launching into Machine Learning en Español
Google Cloud via Coursera
Основы машинного обучения
Higher School of Economics via Coursera