ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model

📅 2024-04-11
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of modeling noisy bounding boxes and the high computational cost of conventional diffusion models—typically requiring dozens of denoising steps—this paper pioneers the integration of Consistency Models (CMs) into object detection, proposing a “few-step denoising” paradigm. Specifically, detection is formulated as a rapid denoising process applied to noisy bounding boxes, where conditional consistency learning enables high-accuracy box recovery from random initializations in only 2–4 steps. The method comprises: (i) diffusion modeling in bounding-box space, (ii) controllable noise injection, (iii) conditional denoising training, and (iv) a self-consistent iterative refinement mechanism. Evaluated on MS-COCO and LVIS, our approach significantly outperforms state-of-the-art detectors in both accuracy and efficiency—achieving 3–5× faster inference while preserving or improving detection performance. The code is publicly available.

Technology Category

Application Category

📝 Abstract
Object detection, a quintessential task in the realm of perceptual computing, can be tackled using a generative methodology. In the present study, we introduce a novel framework designed to articulate object detection as a denoising diffusion process, which operates on the perturbed bounding boxes of annotated entities. This framework, termed extbf{ConsistencyDet}, leverages an innovative denoising concept known as the Consistency Model. The hallmark of this model is its self-consistency feature, which empowers the model to map distorted information from any time step back to its pristine state, thereby realizing a extbf{``few-step denoising''} mechanism. Such an attribute markedly elevates the operational efficiency of the model, setting it apart from the conventional Diffusion Model. Throughout the training phase, ConsistencyDet initiates the diffusion sequence with noise-infused boxes derived from the ground-truth annotations and conditions the model to perform the denoising task. Subsequently, in the inference stage, the model employs a denoising sampling strategy that commences with bounding boxes randomly sampled from a normal distribution. Through iterative refinement, the model transforms an assortment of arbitrarily generated boxes into definitive detections. Comprehensive evaluations employing standard benchmarks, such as MS-COCO and LVIS, corroborate that ConsistencyDet surpasses other leading-edge detectors in performance metrics. Our code is available at https://anonymous.4open.science/r/ConsistencyDet-37D5.
Problem

Research questions and friction points this paper is trying to address.

Object detection as denoising diffusion process
Few-step denoising with self-consistency feature
Improving efficiency over conventional Diffusion Model
Innovation

Methods, ideas, or system contributions that make the work stand out.

Denoising diffusion process for object detection
Consistency Model enables few-step denoising
Transforms noisy boxes into precise detections
🔎 Similar Papers
No similar papers found.
Lifan Jiang
Lifan Jiang
Zhejiang University
AI generation
Z
Zhihui Wang
College of Computer Science and Engineering, Shandong University of Science and Technology, Qingdao 266510, China
C
Changmiao Wang
Shenzhen Research Institute of Big Data, Shenzhen, 518172, China
M
Ming Li
Zhejiang Key Laboratory of Intelligent Education Technology and Application, Zhejiang Normal University, Jinhua 321004, China
Jiaxu Leng
Jiaxu Leng
Chongqing University of Posts and Telecommunications
Computer Vision
X
Xindong Wu
Key Laboratory of Knowledge Engineering with Big Data (the Ministry of Education of China), and the School of Computer Science and Information Technology, Hefei University of Technology, Hefei 230009, China