Logo-VGR: Visual Grounded Reasoning for Open-world Logo Recognition

📅 2025-09-30

📈 Citations: 0

✨ Influential: 0

career value

192K/year

🤖 AI Summary

Existing approaches to open-world logo recognition rely heavily on memorizing brand distributions, limiting generalization to unseen brands. Method: We propose Logo-VGR, a vision-grounded reasoning framework that eliminates large-scale brand representation storage. Instead, it introduces a logo-aware visual localization and guided grounding inference paradigm, formulating logo recognition as an image-logo contrastive matching task. Logo-VGR integrates domain knowledge injection, multimodal large language model foundations, contrastive learning, and domain-adaptive training. Contribution/Results: Experiments demonstrate that Logo-VGR achieves robust recognition of thousands of unseen logos with only minimal brand-level supervision. In out-of-distribution (OOD) evaluation scenarios, it surpasses strong baselines by approximately 10 percentage points, effectively mitigating memory overfitting and substantially enhancing open-world generalization capability.

Technology Category

Application Category

📝 Abstract

Recent advances in multimodal large language models (MLLMs) have been primarily evaluated on general-purpose benchmarks, while their applications in domain-specific scenarios, such as intelligent product moderation, remain underexplored. To address this gap, we introduce an open-world logo recognition benchmark, a core challenge in product moderation. Unlike traditional logo recognition methods that rely on memorizing representations of tens of thousands of brands-an impractical approach in real-world settings-our proposed method, Logo-VGR, enables generalization to large-scale brand recognition with supervision from only a small subset of brands. Specifically, we reformulate logo recognition as a comparison-based task, requiring the model to match product images with candidate logos rather than directly generating brand labels. We further observe that existing models tend to overfit by memorizing brand distributions instead of learning robust multimodal reasoning, which results in poor performance on unseen brands. To overcome this limitation, Logo-VGR introduces a new paradigm of domain-specific multimodal reasoning: Logo Perception Grounding injects domain knowledge, and Logo-Guided Visual Grounded Reasoning enhances the model's reasoning capability. Experimental results show that Logo-VGR outperforms strong baselines by nearly 10 points in OOD settings, demonstrating superior generalization.

Problem

Research questions and friction points this paper is trying to address.

Generalizes logo recognition to unseen brands

Reformulates logo matching as comparison task

Overcomes overfitting through visual reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Reformulates logo recognition as comparison-based task

Introduces domain-specific multimodal reasoning paradigm

Enables generalization with small subset supervision

🔎 Similar Papers

No similar papers found.