TeamLLM: A Human-Like Team-Oriented Collaboration Framework for Multi-Step Contextualized Tasks

πŸ“… 2026-04-08
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitation of existing multimodal large language model (MLLM) frameworks, which lack explicit modeling of human-like team role allocation, resulting in narrow perspectives and suboptimal performance on multi-step contextual tasks. To overcome this, the authors propose TeamLLMβ€”a human team-inspired collaborative MLLM framework that introduces, for the first time, role-based division of labor into multi-model collaboration, featuring four distinct roles and a three-stage coordination pipeline. They also introduce CGPST, a novel evaluation benchmark with context anchoring and procedural structure to enable process-oriented, multidimensional assessment. Experimental results demonstrate that TeamLLM significantly outperforms baseline methods on CGPST across overall performance, step-level accuracy, and multiple evaluation dimensions. The authors further release a dataset comprising diverse scenarios, full model responses, and human annotations.
πŸ“ Abstract
Recently, multi-Large Language Model (LLM) frameworks have been proposed to solve contextualized tasks. However, these frameworks do not explicitly emulate human team role division, which may lead to a single perspective, thereby weakening performance on multi-step contextualized tasks. To address this issue, we propose TeamLLM, a human-like Team-Oriented Multi-LLM Collaboration Framework. TeamLLM adopts four team roles with distinct division and employs a three-phase multi-LLM collaboration for multi-step contextualized tasks. To evaluate the effectiveness of TeamLLM on multi-step contextualized tasks, we propose Contextually-Grounded and Procedurally-Structured tasks (CGPST) and construct the CGPST benchmark. This benchmark has four core features: contextual grounding, procedural structure, process-oriented evaluation and multi-dimensional assessment. We evaluate ten popular LLMs on CGPST at overall-level, step-level, and dimension-level. Results show that TeamLLM substantially improves performance on CGPST. We release the benchmark with scenarios, full-process responses and human scores from ten LLMs. The code and data are available at https://anonymous.4open.science/r/TeamLLM-anonymous-C50E/.
Problem

Research questions and friction points this paper is trying to address.

multi-LLM collaboration
team role division
multi-step contextualized tasks
contextual grounding
procedural structure
Innovation

Methods, ideas, or system contributions that make the work stand out.

TeamLLM
multi-LLM collaboration
role division
contextualized tasks
CGPST benchmark
πŸ”Ž Similar Papers
No similar papers found.