ASIA: Adaptive 3D Segmentation using Few Image Annotations

📅 2025-09-29
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenging problem of segmenting non-semantic, textually ambiguous parts in 3D models. We propose an adaptive 3D segmentation framework requiring only a few single-view, in-the-wild image annotations—without multi-view inputs, 3D ground truth, or precise textual descriptions. Methodologically, we leverage visual priors from text-to-image diffusion models (e.g., Stable Diffusion) to optimize learnable text tokens; jointly enforce cross-view part correspondence and noise-robust optimization; and achieve accurate label transfer from 2D image space to 3D mesh via UV-parameterized voting fusion. To our knowledge, this is the first method enabling high-fidelity 3D part segmentation from minimal single-image supervision. Our approach achieves state-of-the-art performance on both semantic and non-semantic part segmentation tasks, as demonstrated by comprehensive quantitative and qualitative evaluations.

Technology Category

Application Category

📝 Abstract
We introduce ASIA (Adaptive 3D Segmentation using few Image Annotations), a novel framework that enables segmentation of possibly non-semantic and non-text-describable "parts" in 3D. Our segmentation is controllable through a few user-annotated in-the-wild images, which are easier to collect than multi-view images, less demanding to annotate than 3D models, and more precise than potentially ambiguous text descriptions. Our method leverages the rich priors of text-to-image diffusion models, such as Stable Diffusion (SD), to transfer segmentations from image space to 3D, even when the annotated and target objects differ significantly in geometry or structure. During training, we optimize a text token for each segment and fine-tune our model with a novel cross-view part correspondence loss. At inference, we segment multi-view renderings of the 3D mesh, fuse the labels in UV-space via voting, refine them with our novel Noise Optimization technique, and finally map the UV-labels back onto the mesh. ASIA provides a practical and generalizable solution for both semantic and non-semantic 3D segmentation tasks, outperforming existing methods by a noticeable margin in both quantitative and qualitative evaluations.
Problem

Research questions and friction points this paper is trying to address.

Segmenting non-semantic parts in 3D with few image annotations
Transferring segmentations from images to 3D despite geometric differences
Providing a generalizable solution for semantic and non-semantic 3D segmentation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages text-to-image diffusion models for 3D segmentation
Optimizes text tokens and fine-tunes with cross-view loss
Refines segmentation via Noise Optimization and UV-space fusion
🔎 Similar Papers
No similar papers found.
Sai Raj Kishore Perla
Sai Raj Kishore Perla
Simon Fraser University, Canada
A
Aditya Vora
Simon Fraser University, Canada
Sauradip Nag
Sauradip Nag
CVSSP, University of Surrey
Computer VisionComputer GraphicsDeep Learning
A
Ali Mahdavi-Amiri
Simon Fraser University, Canada
H
Hao Zhang
Simon Fraser University, Canada