IMPACT-CYCLE: A Contract-Based Multi-Agent System for Claim-Level Supervisory Correction of Long-Video Semantic Memory

πŸ“… 2026-04-21
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

195K/year
πŸ€– AI Summary
Current long-form video understanding systems lack interpretable intermediate states, necessitating costly error correction through full re-examination of the original video. This work proposes a contract-based multi-agent supervision framework that models comprehension as iterative maintenance of versioned semantic memory: role-specific agents, governed by permission contracts, collaboratively verify local correctness, temporal consistency, and global semantic coherence. Structured error correction is achieved via a claim dependency graph, enabling localized revisions whose computational cost scales proportionally with error magnitude. The system further introduces, for the first time, a human–AI collaborative arbitration mechanism. Evaluated on the VidOR dataset, it improves VQA performance from 0.71 to 0.79 while reducing human arbitration cost by 4.8Γ—, significantly outperforming fully manual annotation.

Technology Category

Application Category

πŸ“ Abstract
Correcting errors in long-video understanding is disproportionately costly: existing multimodal pipelines produce opaque, end-to-end outputs that expose no intermediate state for inspection, forcing annotators to revisit raw video and reconstruct temporal logic from scratch. The core bottleneck is not generation quality alone, but the absence of a supervisory interface through which human effort can be proportional to the scope of each error. We present IMPACT-CYCLE, a supervisory multi-agent system that reformulates long-video understanding as iterative claim-level maintenance of a shared semantic memory -- a structured, versioned state encoding typed claims, a claim dependency graph, and a provenance log. Role-specialized agents operating under explicit authority contracts decompose verification into local object-relation correctness, cross-temporal consistency, and global semantic coherence, with corrections confined to structurally dependent claims. When automated evidence is insufficient, the system escalates to human arbitration as the supervisory authority with final override rights; dependency-closure re-verification then ensures correction cost remains proportional to error scope. Experiments on VidOR show substantially improved downstream reasoning (VQA: 0.71 to 0.79) and a 4.8x reduction in human arbitration cost, with workload significantly lower than manual annotation. Code will be released at https://github.com/MKong17/IMPACT_CYCLE.
Problem

Research questions and friction points this paper is trying to address.

long-video understanding
error correction
supervisory interface
semantic memory
human arbitration
Innovation

Methods, ideas, or system contributions that make the work stand out.

claim-level supervision
multi-agent system
semantic memory
dependency graph
human-in-the-loop arbitration
πŸ”Ž Similar Papers