Self-supervised Learning Of Visual Pose Estimation Without Pose Labels By Classifying LED States

📅 2025-09-12

📈 Citations: 0

✨ Influential: 0

career value

216K/year

🤖 AI Summary

This work addresses the challenge of self-supervised relative pose estimation for ground robots using only monocular RGB images—without pose labels, CAD models, or appearance priors. We propose a novel proxy-task framework leveraging LED state classification as a self-supervised signal. Specifically, we design an LED state recognition network and integrate it with monocular depth-ambiguity correction and joint relative orientation regression modules, trained end-to-end on video sequences captured during random multi-robot motion. Our key contribution is the first use of visible-light LED states as a weakly supervised proxy for pose learning, eliminating reliance on ground-truth poses, external localization systems, or robot geometric priors. Experiments demonstrate that our method achieves accuracy comparable to state-of-the-art (SOTA) approaches requiring pose labels or CAD models, while exhibiting strong cross-domain generalization and native support for collaborative multi-robot pose estimation.

Technology Category

Application Category

📝 Abstract

We introduce a model for monocular RGB relative pose estimation of a ground robot that trains from scratch without pose labels nor prior knowledge about the robot's shape or appearance. At training time, we assume: (i) a robot fitted with multiple LEDs, whose states are independent and known at each frame; (ii) knowledge of the approximate viewing direction of each LED; and (iii) availability of a calibration image with a known target distance, to address the ambiguity of monocular depth estimation. Training data is collected by a pair of robots moving randomly without needing external infrastructure or human supervision. Our model trains on the task of predicting from an image the state of each LED on the robot. In doing so, it learns to predict the position of the robot in the image, its distance, and its relative bearing. At inference time, the state of the LEDs is unknown, can be arbitrary, and does not affect the pose estimation performance. Quantitative experiments indicate that our approach: is competitive with SoA approaches that require supervision from pose labels or a CAD model of the robot; generalizes to different domains; and handles multi-robot pose estimation.

Problem

Research questions and friction points this paper is trying to address.

Self-supervised pose estimation without labeled data

Monocular RGB relative pose prediction for robots

Learning from LED states instead of pose labels

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-supervised learning without pose labels

LED state classification for visual training

Monocular RGB relative pose estimation

🔎 Similar Papers

No similar papers found.

Bosch Group

Attraktive Vergütung

Horb am Neckar, BW, DE

Research Scientist Intern, Multimodal Generative AI and Robotics (PhD)