Towards High-Fidelity Open-Vocabulary 3D Scene Understanding
Offered By: Montreal Robotics via YouTube
Course Description
Overview
Explore cutting-edge research on open-vocabulary 3D scene understanding in this insightful conference talk. Delve into Transformer-based networks and their applications in various 3D scene understanding tasks, including object segmentation, human body part segmentation, and vectorized floorplan reconstruction. Discover the limitations of fully-supervised models in real-world scenarios and learn about innovative open-vocabulary approaches that leverage foundation models like CLIP and SAM. Gain valuable insights into the current challenges and future directions of this rapidly evolving field. Presented by Francis Engelmann, a PostDoc at ETH Zurich and visiting researcher at Google, this talk offers a comprehensive overview of recent advancements in 3D scene understanding and their potential impact on computer vision applications.
Syllabus
Francis Engelmann: Towards High-Fidelity Open-Vocabulary 3D Scene Understanding
Taught by
Montreal Robotics
Related Courses
Soil Structure InteractionIndian Institute of Technology, Kharagpur via Swayam Fundamentals of Machine Learning for Healthcare
Stanford University via Coursera Artificial Intelligence Foundations: Thinking Machines
LinkedIn Learning Could a Purely Self-Supervised Foundation Model Achieve Grounded Language Understanding?
Santa Fe Institute via YouTube Foundation Models - FSDL 2022
The Full Stack via YouTube