🤖 AI Summary
Modeling real-world interactive scenarios via multi-agent simulation traditionally relies heavily on domain experts, resulting in high costs and poor scalability. This paper introduces the first end-to-end, game-theoretic framework for automatic formalization, which compiles natural-language descriptions of interactive scenarios directly into executable and verifiable logic programs (ASP/Prolog). Methodologically, it employs a large language model (LLM)-driven, multi-agent collaborative pipeline—leveraging Claude 3.5 Sonnet and GPT-4o—to perform rule generation, syntactic validation, tournament-based simulation, and semantic verification against payoff matrices, augmented by constraint solvers to ensure logical consistency. Its key contributions are a novel game-theory-guided agent architecture and a dual-layer verification mechanism (syntactic + semantic). Evaluated on 110 two-player, two-action (2×2) games, the framework achieves syntactic correctness rates of 99.82%–100% and semantic correctness of 76.5%–77%, substantially improving both formalization efficiency and fidelity.
📝 Abstract
Multi-agent simulations facilitate the exploration of interactions among both natural and artificial agents. However, modelling real-world scenarios and developing simulations often requires substantial expertise and effort. To streamline this process, we present a framework that enables the autoformalization of interaction scenarios using agents augmented by large language models (LLMs) utilising game-theoretic formalisms. The agents translate natural language descriptions of interactions into executable logic programs that define the rules of each game, ensuring syntactic correctness through validation by a solver. A tournament simulation then tests the functionality of the generated game rules and strategies. After the tournament, if a ground truth payoff matrix is available, an exact semantic validation is performed. We evaluate our approach on a diverse set of 110 natural language descriptions exemplifying five $2 imes2$ simultaneous-move games, achieving 100% syntactic and 76.5% semantic correctness in the generated game rules for Claude 3.5 Sonnet, and 99.82% syntactic and 77% semantic correctness for GPT-4o. Additionally, we demonstrate high semantic correctness in autoformalizing gameplay strategies. Overall, the results highlight the potential of autoformalization to leverage LLMs in generating formal reasoning modules for decision-making agents.