🤖 AI Summary
Current service composition faces three core challenges: (1) semantic reasoning and dynamic execution are decoupled; (2) large reasoning models (LRMs) lack embodied action capabilities; and (3) large action models (LAMs) exhibit insufficient deep reasoning. Method: This paper proposes the first systematic collaborative architecture integrating LRMs and LAMs. It establishes a four-stage closed loop—semantic understanding, hierarchical planning, action generation, and cross-system execution—to bridge the semantic gap between intent comprehension and automated workflow orchestration, as well as interoperability fragmentation. Innovatively, it introduces a verifiable reasoning–action alignment mechanism and lightweight adaptation interfaces to enable dynamic composition across heterogeneous service ecosystems. Contribution/Results: Evaluation on multi-domain natural-language-driven service orchestration tasks shows that the prototype system achieves a 37.2% improvement in task completion rate and a 29.5% reduction in average execution latency, significantly enhancing both automation capability and practical deployability.
📝 Abstract
Service composition remains a central challenge in building adaptive and intelligent software systems, often constrained by limited reasoning capabilities or brittle execution mechanisms. This paper explores the integration of two emerging paradigms enabled by large language models: Large Reasoning Models (LRMs) and Large Action Models (LAMs). We argue that LRMs address the challenges of semantic reasoning and ecosystem complexity while LAMs excel in dynamic action execution and system interoperability. However, each paradigm has complementary limitations - LRMs lack grounded action capabilities, and LAMs often struggle with deep reasoning. We propose an integrated LRM-LAM architectural framework as a promising direction for advancing automated service composition. Such a system can reason about service requirements and constraints while dynamically executing workflows, thus bridging the gap between intention and execution. This integration has the potential to transform service composition into a fully automated, user-friendly process driven by high-level natural language intent.