RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Offered By: Montreal Robotics via YouTube

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!

Explore the groundbreaking research on incorporating vision-language models trained on Internet-scale data into end-to-end robotic control. Delve into the study of how this integration enhances generalization and enables emergent semantic reasoning in robotics. Learn about the novel approach of co-fine-tuning state-of-the-art vision-language models on both robotic trajectory data and Internet-scale vision-language tasks. Discover the innovative technique of expressing robotic actions as text tokens, allowing for seamless integration with natural language responses. Examine the concept of vision-language-action models (VLA) and the specific implementation known as RT-2. Analyze the extensive evaluation results, showcasing improved generalization to novel objects, interpretation of complex commands, and rudimentary reasoning abilities. Explore the potential of chain of thought reasoning in enabling multi-stage semantic reasoning for robotic tasks. Gain insights into the future possibilities of robotic control enhanced by large-scale pretraining on language and vision-language data from the web.

Syllabus

Yevgen Chebotar: RT-2- Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Taught by

Montreal Robotics

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

RT-2: Vision-Language-Action Models Transfer Web Knowledge to Robotic Control

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue