🤖 AI Summary
This work addresses the lack of low-level action–environment causal modeling capability in embodied agents. We propose the first framework for unsupervised low-level causal structure learning in simulated robotic arm environments. Methodologically, it integrates physics-aware visual encoding, reinforcement-learning-driven active intervention policies, contrastive causal representation learning, and a causal graph neural network to autonomously discover “action → effect”-level causal relationships (e.g., pushing an object causes its displacement) directly from pixel observations—without requiring causal labels. Our core contribution is the extension of causal discovery to the foundational interaction level and the construction of transferable causal abstract representations. Experiments demonstrate that the framework achieves 87.3% accuracy in identifying low-level causal pairs within the simulation environment; after cross-object and cross-scene transfer, it maintains 76.5% causal reasoning accuracy—significantly outperforming existing baselines.