DICArt: Advancing Category-level Articulated Object Pose Estimation in Discrete State-Spaces

📅 2026-02-23
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of regressing category-level 6D poses of articulated objects in continuous space, where existing methods struggle to effectively incorporate kinematic constraints and navigate complex search spaces. To this end, we propose the first conditional discrete diffusion framework for this task, which recovers object poses through a learned reverse process that iteratively denoises discrete pose representations. Our approach explicitly integrates generative priors with physical constraints via a hierarchical kinematic coupling strategy and a dynamic flow-based decision mechanism. Extensive experiments demonstrate that the proposed method achieves state-of-the-art performance on both synthetic and real-world datasets, significantly improving the accuracy and robustness of articulated object pose estimation.

Technology Category

Application Category

📝 Abstract
Articulated object pose estimation is a core task in embodied AI. Existing methods typically regress poses in a continuous space, but often struggle with 1) navigating a large, complex search space and 2) failing to incorporate intrinsic kinematic constraints. In this work, we introduce DICArt (DIsCrete Diffusion for Articulation Pose Estimation), a novel framework that formulates pose estimation as a conditional discrete diffusion process. Instead of operating in a continuous domain, DICArt progressively denoises a noisy pose representation through a learned reverse diffusion procedure to recover the GT pose. To improve modeling fidelity, we propose a flexible flow decider that dynamically determines whether each token should be denoised or reset, effectively balancing the real and noise distributions during diffusion. Additionally, we incorporate a hierarchical kinematic coupling strategy, estimating the pose of each rigid part hierarchically to respect the object's kinematic structure. We validate DICArt on both synthetic and real-world datasets. Experimental results demonstrate its superior performance and robustness. By integrating discrete generative modeling with structural priors, DICArt offers a new paradigm for reliable category-level 6D pose estimation in complex environments.
Problem

Research questions and friction points this paper is trying to address.

articulated object pose estimation
category-level
kinematic constraints
6D pose estimation
discrete state-space
Innovation

Methods, ideas, or system contributions that make the work stand out.

discrete diffusion
articulated object pose estimation
kinematic constraints
conditional generative modeling
hierarchical pose estimation
🔎 Similar Papers
L
Li Zhang
University of Science and Technology of China
M
Mingyu Mei
Zhejiang University
A
Ailing Wang
East China Normal University
X
Xianhui Meng
University of Science and Technology of China
Yan Zhong
Yan Zhong
Peking University
Machine LearningDeep LearningComputer VisionData MiningLarge Language Models
Xinyuan Song
Xinyuan Song
Emory University
Statisticsmachine learning
Liu Liu
Liu Liu
Hefei University of Technology
deep learningcomputer visionrobotics
R
Rujing Wang
University of Science and Technology of China
Z
Zaixing He
Zhejiang University
C
Cewu Lu
Shanghai Jiao Tong University