🤖 AI Summary
This work addresses the data bottleneck in user perspective modeling, which stems from privacy sensitivity and scarce annotations, making it difficult to infer internal states—such as goals or emotions—from digital footprints. The problem is formalized as a structured inverse reasoning task, and a novel paradigm termed Situation Graph Prediction (SGP) is proposed to reconstruct users’ latent perspectives by integrating multimodal observable traces with ontology-aligned structured representations. To enable training without real-world labels, a structure-prior synthetic data generation strategy is devised, complemented by retrieval-augmented in-context learning. Experiments on GPT-4o demonstrate that inferring latent states is significantly more challenging than extracting surface-level information, thereby validating both the difficulty of the SGP task and the efficacy of the proposed approach.
📝 Abstract
Perspective-Aware AI requires modeling evolving internal states--goals, emotions, contexts--not merely preferences. Progress is limited by a data bottleneck: digital footprints are privacy-sensitive and perspective states are rarely labeled. We propose Situation Graph Prediction (SGP), a task that frames perspective modeling as an inverse inference problem: reconstructing structured, ontology-aligned representations of perspective from observable multimodal artifacts. To enable grounding without real labels, we use a structure-first synthetic generation strategy that aligns latent labels and observable traces by design. As a pilot, we construct a dataset and run a diagnostic study using retrieval-augmented in-context learning as a proxy for supervision. In our study with GPT-4o, we observe a gap between surface-level extraction and latent perspective inference--indicating latent-state inference is harder than surface extraction under our controlled setting. Results suggest SGP is non-trivial and provide evidence for the structure-first data synthesis strategy.