Reasoning Like an Economist: Post-Training on Economic Problems Induces Strategic Generalization in LLMs

πŸ“… 2025-05-31
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
Directly adapting large language models (LLMs) to multi-agent systems (MAS) faces challenges including complex reward modeling, highly dynamic agent interactions, and stringent generalization requirements. Method: This paper proposes a domain-aligned post-training paradigm, using economics as a structured testbed; it combines supervised fine-tuning (SFT) with reinforcement learning from verifiable rewards (RLVR) on a high-quality, self-constructed economic reasoning dataset (2,100 problems) to train a 7B open-weight LLM. Contribution/Results: This work introduces RLVR to economic reasoning for the first time, significantly improving the model’s equilibrium prediction accuracy, strategic consistency, and economic rationality in unseen multi-agent games. It empirically demonstrates that post-training can effectively induce cross-task strategic generalization and shape agent alignment with rational behavioral patterns.

Technology Category

Application Category

πŸ“ Abstract
Directly training Large Language Models (LLMs) for Multi-Agent Systems (MAS) remains challenging due to intricate reward modeling, dynamic agent interactions, and demanding generalization requirements. This paper explores whether post-training techniques, specifically Supervised Fine-Tuning (SFT) and Reinforcement Learning with Verifiable Rewards (RLVR), can effectively $ extit{generalize}$ to multi-agent scenarios. We use economic reasoning as a testbed, leveraging its strong foundations in mathematics and game theory, its demand for structured analytical reasoning, and its relevance to real-world applications such as market design, resource allocation, and policy analysis. We introduce $ extbf{Recon}$ ($ extbf{R}$easoning like an $ extbf{ECON}$omist), a 7B-parameter open-source LLM post-trained on a hand-curated dataset of 2,100 high-quality economic reasoning problems. Comprehensive evaluation on economic reasoning benchmarks and multi-agent games reveals clear improvements in structured reasoning and economic rationality. These results underscore the promise of domain-aligned post-training for enhancing reasoning and agent alignment, shedding light on the roles of SFT and RL in shaping model behavior. Code is available at https://github.com/MasterZhou1/Recon .
Problem

Research questions and friction points this paper is trying to address.

Enhancing LLMs' generalization in multi-agent systems via post-training
Applying economic reasoning to improve structured analytical capabilities
Evaluating SFT and RLVR for strategic reasoning in game theory
Innovation

Methods, ideas, or system contributions that make the work stand out.

Post-training with SFT and RLVR techniques
Uses economic reasoning for multi-agent generalization
Introduces Recon model for structured economic analysis
πŸ”Ž Similar Papers
No similar papers found.