LangHOPS: Language Grounded Hierarchical Open-Vocabulary Part Segmentation

📅 2025-10-29

📈 Citations: 0

✨ Influential: 0

career value

166K/year

🤖 AI Summary

This work addresses open-vocabulary object and part instance segmentation—jointly detecting and segmenting hierarchical objects and their parts from an open vocabulary. Methodologically, it introduces multimodal large language models (MLLMs) to part segmentation for the first time, proposing a language-guided hierarchical semantic modeling framework that achieves cross-granularity concept association and zero-shot generalization via vision-language alignment, hierarchical query generation, and autoregressive decoding. Key innovations include language-space-driven semantic structure construction and an MLLM-based query optimization strategy. Experiments demonstrate significant improvements: +5.5% and +4.8% AP on PartImageNet for in-domain and cross-dataset evaluation, respectively, and +2.5% mIoU on zero-shot part segmentation over ADE20K—substantially outperforming prior state-of-the-art methods.

Technology Category

Application Category

📝 Abstract

We propose LangHOPS, the first Multimodal Large Language Model (MLLM) based framework for open-vocabulary object-part instance segmentation. Given an image, LangHOPS can jointly detect and segment hierarchical object and part instances from open-vocabulary candidate categories. Unlike prior approaches that rely on heuristic or learnable visual grouping, our approach grounds object-part hierarchies in language space. It integrates the MLLM into the object-part parsing pipeline to leverage its rich knowledge and reasoning capabilities, and link multi-granularity concepts within the hierarchies. We evaluate LangHOPS across multiple challenging scenarios, including in-domain and cross-dataset object-part instance segmentation, and zero-shot semantic segmentation. LangHOPS achieves state-of-the-art results, surpassing previous methods by 5.5% Average Precision (AP) (in-domain) and 4.8% (cross-dataset) on the PartImageNet dataset and by 2.5% mIOU on unseen object parts in ADE20K (zero-shot). Ablation studies further validate the effectiveness of the language-grounded hierarchy and MLLM driven part query refinement strategy. The code will be released here.

Problem

Research questions and friction points this paper is trying to address.

Enables open-vocabulary hierarchical object and part segmentation

Grounds object-part hierarchies in language space using MLLMs

Improves segmentation accuracy across diverse and unseen categories

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses MLLM for open-vocabulary part segmentation

Grounds object-part hierarchies in language space

Leverages MLLM knowledge for query refinement

🔎 Similar Papers

No similar papers found.