🤖 AI Summary
This work investigates the capacity of language models to compose fundamental skills within context to solve composite tasks, focusing on generalization bottlenecks when encountering unseen skill combinations. Method: We construct a controlled benchmark comprising linguistic and logical tasks, and systematically evaluate multiple open-source models under in-context learning and chain-of-thought prompting. We identify that superficially simple examples often induce interference, and emphasize semantic alignment between examples and compositional reasoning steps—formalizing the “step-alignment” principle—and design a novel probing methodology grounded in this insight. Contribution/Results: Experiments reveal that current models fail to autonomously identify and compose latent skills. In contrast, our alignment-based approach substantially improves accuracy on composite tasks. The framework provides an interpretable, reproducible theoretical foundation and practical methodology for modeling contextual skill composition.
📝 Abstract
Composing basic skills from simple tasks to accomplish composite tasks is crucial for modern intelligent systems. We investigate the in-context composition ability of language models to perform composite tasks that combine basic skills demonstrated in in-context examples. This is more challenging than the standard setting, where skills and their composition can be learned in training. We conduct systematic experiments on various representative open-source language models, utilizing linguistic and logical tasks designed to probe composition abilities. The results reveal that simple task examples can have a surprising negative impact on the performance, because the models generally struggle to recognize and assemble the skills correctly, even with Chain-of-Thought examples. Theoretical analysis further shows that it is crucial to align examples with the corresponding steps in the composition. This inspires a method for the probing tasks, whose improved performance provides positive support for our insights.