EgoVIS@CVPR: What Changed and What Could Have Changed? State-Change Counterfactuals for Procedure-Aware Video Representation Learning

📅 2025-05-30

📈 Citations: 0

✨ Influential: 0

career value

157K/year

🤖 AI Summary

This work addresses the insufficient joint modeling of action steps and scene state changes in procedural activity understanding. We propose a process-aware video representation learning framework that, for the first time, leverages explicit state-change descriptions generated by large language models (LLMs) as supervisory signals—and constructs their counterfactual variants—to jointly model bidirectional causal relationships between actions and states, thereby enhancing “if–then” reasoning. Our key contributions are: (1) the first use of LLM-generated state descriptions and their counterfactual counterparts for video representation learning; and (2) unified causal modeling of both normative procedures and anomalous/erroneous steps. Extensive experiments demonstrate significant improvements over state-of-the-art methods on temporal action segmentation and procedural error detection, validating the effectiveness of explicit state supervision and counterfactual reasoning for procedural understanding.

Technology Category

Application Category

📝 Abstract

Understanding a procedural activity requires modeling both how action steps transform the scene, and how evolving scene transformations can influence the sequence of action steps, even those that are accidental or erroneous. Yet, existing work on procedure-aware video representations fails to explicitly learned the state changes (scene transformations). In this work, we study procedure-aware video representation learning by incorporating state-change descriptions generated by LLMs as supervision signals for video encoders. Moreover, we generate state-change counterfactuals that simulate hypothesized failure outcomes, allowing models to learn by imagining the unseen ``What if'' scenarios. This counterfactual reasoning facilitates the model's ability to understand the cause and effect of each step in an activity. To verify the procedure awareness of our model, we conduct extensive experiments on procedure-aware tasks, including temporal action segmentation, error detection, and more. Our results demonstrate the effectiveness of the proposed state-change descriptions and their counterfactuals, and achieve significant improvements on multiple tasks.

Problem

Research questions and friction points this paper is trying to address.

Modeling scene transformations in procedural activities

Incorporating state-change descriptions for video representation learning

Enhancing understanding through counterfactual reasoning in activities

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-generated state-change descriptions as supervision

Counterfactuals simulate hypothesized failure outcomes

Enhances cause-effect understanding in procedural activities

🔎 Similar Papers

No similar papers found.