Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences

Offered By: Andreas Geiger via YouTube

Course Description

Overview

Explore a keynote presentation on unsupervised learning of object-centric representations for video sequences. Delve into the comparison of state-of-the-art approaches like OP3, TBA, and MONet, examining their perceptual abilities in detection, figure-ground segmentation, and object tracking. Discover the proposed benchmark dataset using procedurally generated video sequences and learn about the evaluation protocol for multi-object tracking. Gain insights into object-centric learning principles, attention networks, next-frame prediction, spatial transformations, and dynamics networks. Analyze performance metrics, challenging cases, out-of-distribution test sets, and runtime analysis on GPU. Understand the importance of perceiving the world in terms of objects for reasoning and scene understanding in computer vision.

Syllabus

Intro
Collaborators
Why Object-Centric Learning? Explicit object representation
Tracking by Detection
Unsupervised Object-Centric Learning
Common Principle
Categorization of Approaches
VIMON: Attention Network
VIMON: Next-Frame Prediction
TBA Tracker Array
TBA Mid-Level Representation
TBA Spatial Transformation
Spatial Mixture Models: OP3
OP3 Dynamics Network
CLEAR MOT Metrics
Datasets
Results on SpMOT
How Well Do Models Accumulate Evidence Over Time?
Dependency of Performance on Number of Objects
Challenging Cases
VMDS Challenge Sets
Out-of-Distribution Test Sets
Runtime Analysis Runtime on Single RTX 2080 TI GPU
Conclusions
2D Annotations

Taught by

Andreas Geiger

Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Unmasking the Inductive Biases of Unsupervised Object Representations for Video Sequences

Tags

Course Description

Overview

Syllabus

Taught by

Related Courses

Login to Continue