🤖 AI Summary
This work addresses the challenge that large language models struggle to effectively extract and apply novel knowledge from complex task contexts during in-context learning. To overcome this limitation, the authors propose a high-fidelity chain-of-thought synthesis mechanism that enhances the model’s understanding and utilization of task-specific context through the generation of high-quality synthetic reasoning paths. The proposed approach substantially improves model performance on context-dependent tasks, achieving a significant increase in average solve rate—from 17.2% to a markedly higher level—on the CL-Bench benchmark. This advancement effectively narrows the performance gap between state-of-the-art models on such challenging tasks.
📝 Abstract
While LLMs excel at reasoning over prompts using static pretrained knowledge, they struggle significantly with context learning-the ability to dynamically extract, internalize, and apply new knowledge from complex, task-specific contexts. Recent evaluations on the CL-Bench reveal a critical capability gap: frontier models solve only 17.2% of context-dependent tasks on average.