REC-RL: Referring expression counting via Gaussian and range-based reward optimization

📅 2026-05-15

📈 Citations: 0

✨ Influential: 0

career value

173K/year

🤖 AI Summary

This work addresses the limitation of existing Referring Expression Counting (REC) methods, which rely solely on final accuracy as a reward signal while neglecting the quality of intermediate reasoning steps. To overcome this, the authors propose REC-RL, a novel framework that introduces reinforcement learning to explicitly optimize visual reasoning paths in REC for the first time. Adopting a think-range-answer paradigm, REC-RL models intermediate focus predictions as internal decisions, aligning with human perception without requiring additional annotations. The approach innovatively integrates Group Relative Policy Optimization, interval-based supervision, a Gaussian-precision-guided accuracy reward, and a format reward for structured outputs. Extensive experiments demonstrate that REC-RL consistently outperforms strong baselines across multiple benchmarks, achieving significant performance gains and exhibiting robust generalization capabilities.

📝 Abstract

Referring expression counting (REC) is an intention-driven task that requires context-aware visual reasoning. While recent vision-language models incorporate language for visual understanding, most existing REC methods rely on rulebased reinforcement learning with rewards focused primarily on final accuracy, overlooking the quality of intermediate reasoning. We propose REC-RL, a reinforcement learning framework that introduces a think-range-answer paradigm to explicitly optimize the visual reasoning process. RECRL employs Group Relative Policy Optimization and two lightweight rewards: an accuracy reward that combines range-based interval supervision with Gaussian-based precision guidance, and a format reward that enforces structured outputs. By modeling intermediate focus prediction as internal decision-making, REC-RL avoids additional annotations and better aligns with human perception. Extensive experiments demonstrate consistent improvements over strong baselines and robust generalization across benchmarks.

Problem

Research questions and friction points this paper is trying to address.

referring expression counting

visual reasoning

reinforcement learning

intermediate reasoning

reward optimization

Innovation

Methods, ideas, or system contributions that make the work stand out.

referring expression counting

reinforcement learning

visual reasoning