Can Language Models Compose Skills In-Context?

📅 2025-10-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work investigates the capacity of language models to compose fundamental skills within context to solve composite tasks, focusing on generalization bottlenecks when encountering unseen skill combinations. Method: We construct a controlled benchmark comprising linguistic and logical tasks, and systematically evaluate multiple open-source models under in-context learning and chain-of-thought prompting. We identify that superficially simple examples often induce interference, and emphasize semantic alignment between examples and compositional reasoning steps—formalizing the “step-alignment” principle—and design a novel probing methodology grounded in this insight. Contribution/Results: Experiments reveal that current models fail to autonomously identify and compose latent skills. In contrast, our alignment-based approach substantially improves accuracy on composite tasks. The framework provides an interpretable, reproducible theoretical foundation and practical methodology for modeling contextual skill composition.

Technology Category

Application Category

📝 Abstract
Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the in-context composition ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising negative impact on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.
Problem

Research questions and friction points this paper is trying to address.

Investigating language models' ability to compose skills from demonstrations
Testing models on composite tasks combining linguistic and logical skills
Analyzing why models struggle with skill recognition and assembly
Innovation

Methods, ideas, or system contributions that make the work stand out.

In-context skill composition for composite tasks
Systematic experiments on open-source language models
Aligning examples with composition steps for improvement
🔎 Similar Papers
No similar papers found.