π€ AI Summary
To address the challenge of policy modeling in multi-agent systems due to the absence of global trajectory information, this paper proposes a novel method that infers other agentsβ policies solely from the controlled agentβs local trajectories. Our core contribution is the first Transformer-based encoding framework tailored for local observations, which leverages self-supervised sequential modeling to robustly learn implicit policy embeddings of other agents from limited local trajectory data. The approach operates without access to global state or inter-agent communication, and is universally applicable to cooperative, competitive, and mixed-interaction settings. Experiments demonstrate that the learned policy representations significantly improve modeling accuracy (average +12.7%) and enhance long-horizon cumulative returns across diverse tasks (up to +18.3%). This work establishes a scalable new paradigm for multi-agent coordination and reasoning under resource constraints.
π Abstract
Agent modeling is a critical component in developing effective policies within multi-agent systems, as it enables agents to form beliefs about the behaviors, intentions, and competencies of others. Many existing approaches assume access to other agents' episodic trajectories, a condition often unrealistic in real-world applications. Consequently, a practical agent modeling approach must learn a robust representation of the policies of the other agents based only on the local trajectory of the controlled agent. In this paper, we propose exttt{TransAM}, a novel transformer-based agent modeling approach to encode local trajectories into an embedding space that effectively captures the policies of other agents. We evaluate the performance of the proposed method in cooperative, competitive, and mixed multi-agent environments. Extensive experimental results demonstrate that our approach generates strong policy representations, improves agent modeling, and leads to higher episodic returns.