StreamReady: Learning What to Answer and When in Long Streaming Videos

📅 2026-03-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of long-form video stream understanding, where timely responses must be balanced against accuracy—answering too early risks errors, while delaying undermines real-time performance. To this end, the authors propose a readiness-aware video understanding framework that dynamically determines optimal response timing through a lightweight evidence accumulation mechanism, jointly optimizing answer correctness and response latency. Key contributions include the introduction of Answer Readiness Score (ARS), a novel metric combined with an asymmetric early/late penalty to form a new evaluation protocol; the release of ProReady-QA, the first multi-turn question-answering benchmark annotated with evidence windows; and the integration of readiness-aware modeling with multi-granularity temporal reasoning. Experiments demonstrate state-of-the-art performance across ProReady-QA and eight streaming/long-video benchmarks, achieving high accuracy, strong timeliness, and robust generalization.

Technology Category

Application Category

📝 Abstract
Streaming video understanding often involves time-sensitive scenarios where models need to answer exactly when the supporting visual evidence appears: answering before the evidence reflects speculation, answering after it has passed reduces real-time utility. To capture this behavior, we introduce a readiness-aware formulation of streaming video understanding with the Answer Readiness Score (ARS), a timing-aware objective with asymmetric early and late penalties. When combined with correctness, ARS defines an effective accuracy that measures not just whether a model is right, but whether it answers at the appropriate moment. Building on this formulation, we introduce StreamReady, a framework to unify temporal reasoning with on-time answering through a lightweight readiness mechanism that decides if sufficient evidence has been observed before responding. To evaluate this capability, we further introduce ProReady-QA, a benchmark with annotated answer evidence windows and proactive multi-turn questions across local and global contexts. StreamReady achieves superior performance on ProReady-QA, and consistently outperforms prior methods across eight additional streaming and offline long-video benchmarks, demonstrating robust and broadly generalizable video understanding capability.
Problem

Research questions and friction points this paper is trying to address.

streaming video understanding
answer timing
real-time response
temporal reasoning
video QA
Innovation

Methods, ideas, or system contributions that make the work stand out.

Answer Readiness Score
streaming video understanding
temporal reasoning
on-time answering
readiness-aware formulation
🔎 Similar Papers
No similar papers found.