🤖 AI Summary
This study addresses the challenge of limited observability in multi-agent supply chains, where firms are reluctant to share information due to data privacy concerns, thereby hindering efficient coordination. To overcome this, the authors propose a flexible information-sharing mechanism that enables factory agents to dynamically choose among truthful, deceptive, or mixed strategies when disclosing downstream demand information. Within a multi-agent reinforcement learning framework, they model the environment as a partially observable Markov decision process and incorporate cooperative reward shaping to optimize system-wide performance. Experimental results demonstrate that under low-demand conditions, truthful sharing significantly improves payoffs for all participants, whereas in high-demand scenarios, deceptive strategies yield only marginal gains in total system profit but offer greater individual benefits to factories. This approach transcends the conventional dichotomy of full observability versus complete independence, offering a novel paradigm for partially observable multi-agent systems.
📝 Abstract
Modern supply networks are complex interconnected systems. Multi-agent models are increasingly explored to optimise their performance. Most research assumes agents will have full observability of the system by having a single policy represent the agents, which seems unrealistic as this requires companies to share their data. The alternative is to develop a Hidden-Markov Process with separate policies, making the problem challenging to solve. In this paper, we propose a multi-agent system where the factory agent can share information downstream, increasing the observability of the environment. It can choose to share no information, lie, tell the truth or combine these in a mixed strategy. The results show that data sharing can boost the performance, especially when combined with a cooperative reward shaping. In the high demand scenario there is limited ability to change the strategy and therefore no data sharing approach benefits both agents. However, lying benefits the factory enough for an overall system improvement, although only by a relatively small amount compared to the overall reward. In the low demand scenario, the most successful data sharing is telling the truth which benefits all actors significantly.