YoVDO

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

Offered By: Launchpad via YouTube

Tags

Robotics Courses Machine Learning Courses Computer Vision Courses Reinforcement Learning Courses Data Augmentation Courses Vision-Language Models Courses

Course Description

Overview

Save Big on Coursera Plus. 7,000+ courses at $160 off. Limited Time Only!
Discover the groundbreaking LLaRA framework in this 16-minute video presentation by the Fellowship.ai team. Delve into the innovative approach of enhancing robotic action policy through Large Language Models (LLMs) and Vision-Language Models (VLMs). Learn how LLaRA formulates robot actions as conversation-style instruction-response pairs and improves decision-making by incorporating auxiliary data. Explore the process of training VLMs with visual-textual prompts and the automated pipeline for generating high-quality robotics instruction data from existing behavior cloning datasets. Gain insights into how this framework enables optimal policy decisions for robotic tasks, showcasing state-of-the-art performance in both simulated and real-world environments. Access the code, datasets, and pretrained models on GitHub to further your understanding of this cutting-edge AI innovation in robot learning.

Syllabus

Fellowship: LLaRA, Supercharging Robot Learning Data for Vision-Language Policy


Taught by

Launchpad

Related Courses

Mastering Google's PaliGemma VLM: Tips and Tricks for Success and Fine-Tuning
Sam Witteveen via YouTube
Fine-tuning PaliGemma for Custom Object Detection
Roboflow via YouTube
Florence-2: The Best Small Vision Language Model - Capabilities and Demo
Sam Witteveen via YouTube
Fine-tuning Florence-2: Microsoft's Multimodal Model for Custom Object Detection
Roboflow via YouTube
OpenVLA: An Open-Source Vision-Language-Action Model - Research Presentation
HuggingFace via YouTube