YoVDO

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Offered By: Montreal Robotics via YouTube

Tags

Robotics Courses Machine Learning Courses Computer Vision Courses Transfer Learning Courses Generalization Courses Vision-Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Explore the groundbreaking research on incorporating vision-language models trained on Internet-scale data into end-to-end robotic control. Delve into the study of how this integration enhances generalization and enables emergent semantic reasoning in robotics. Learn about the novel approach of co-fine-tuning state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks. Discover the innovative technique of expressing robotic actions as text tokens, allowing for seamless integration with natural language responses. Examine the concept of vision-language-action models (VLA) and the specific implementation known as RT-2. Analyze the extensive evaluation results, showcasing improved generalization to novel objects, interpretation of complex commands, and rudimentary reasoning abilities. Explore the potential of chain of thought reasoning in enabling multi-stage semantic reasoning for robotic tasks. Gain insights into the future possibilities of robotic control enhanced by large-scale pretraining on language and vision-language data from the web.

Syllabus

Yevgen Chebotar: RT-2- Vision-Language-Action Models Transfer Web Knowledge to Robotic Control


Taught by

Montreal Robotics

Related Courses

Introduction to Artificial Intelligence
Stanford University via Udacity
Artificial Intelligence for Robotics
Stanford University via Udacity
Computer Vision: The Fundamentals
University of California, Berkeley via Coursera
Control of Mobile Robots
Georgia Institute of Technology via Coursera
Artificial Intelligence Planning
University of Edinburgh via Coursera