ReCCur: A Recursive Corner-Case Curation Framework for Robust Vision-Language Understanding in Open and Edge Scenarios

📅 2026-01-06

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the failure of vision-language understanding in open and edge scenarios caused by rare or extreme corner cases. To this end, the authors propose a computationally efficient recursive multi-agent framework that integrates vision-language models with multiple encoders (CLIP, DINOv2, and BEiT). The approach enables automatic purification of noisy data and fine-grained, auditable annotation through tri-modal consistency filtering, region-to-semantic chain verification, mixture-of-experts knowledge distillation, and region-wise adversarial evidence labeling. With minimal human intervention, the system substantially enhances data purity and class separability while remaining deployable on consumer-grade GPUs, thereby providing a high-accuracy, interpretable foundation for downstream tasks.

Technology Category

Application Category

📝 Abstract

Corner cases are rare or extreme scenarios that drive real-world failures, but they are difficult to curate at scale: web data are noisy, labels are brittle, and edge deployments preclude large retraining. We present ReCCur (Recursive Corner-Case Curation), a low-compute framework that converts noisy web imagery into auditable fine-grained labels via a multi-agent recursive pipeline. First, large-scale data acquisition and filtering expands a domain vocabulary with a vision-language model (VLM), crawls the web, and enforces tri-modal (image, description, keyword) consistency with light human spot checks to yield refined candidates. Next, mixture-of-experts knowledge distillation uses complementary encoders (e.g., CLIP, DINOv2, BEiT) for kNN voting with dual-confidence activation and uncertainty sampling, converging to a high-precision set. Finally, region-evidence VLM adversarial labeling pairs a proposer (multi-granularity regions and semantic cues) with a validator (global and local chained consistency) to produce explainable labels and close the loop. On realistic corner-case scenarios (e.g., flooded-car inspection), ReCCur runs on consumer-grade GPUs, steadily improves purity and separability, and requires minimal human supervision, providing a practical substrate for downstream training and evaluation under resource constraints. Code and dataset will be released.

Problem

Research questions and friction points this paper is trying to address.

corner cases

vision-language understanding

data curation

edge scenarios

open-world

Innovation

Methods, ideas, or system contributions that make the work stand out.

corner-case curation

vision-language model

multi-agent recursive pipeline

mixture-of-experts distillation

adversarial labeling

🔎 Similar Papers

No similar papers found.

Authors to Follow