STaD: Scaffolded Task Design for Identifying Compositional Skill Gaps in LLMs

πŸ“… 2026-04-20
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

200K/year
πŸ€– AI Summary
Current benchmarks struggle to precisely diagnose the specific skill gaps of large language models in compositional reasoning. This work proposes the Scaffolding-based Task Design (STaD) framework, which, for the first time, integrates educational scaffolding theory into model evaluation. By generating controlled task variants through structured and incremental support, STaD systematically decomposes and probes compositional reasoning capabilities. Operating under a black-box assumption, the approach enables scalable, fine-grained diagnosis of skill deficiencies. Experiments across six models and three reasoning benchmarks reveal distinct and concrete failure patterns unique to each model, demonstrating STaD’s effectiveness and novelty in pinpointing weaknesses in compositional reasoning.

Technology Category

Application Category

πŸ“ Abstract
Benchmarks are often used as a standard to understand LLM capabilities in different domains. However, aggregate benchmark scores provide limited insight into compositional skill gaps of LLMs and how to improve them. To make these weaknesses visible, we propose Scaffolded Task Design (STaD) framework. STaD generates controlled variations of benchmark tasks based on the concept of scaffolding, which introduces structured, incremental support in a step-by-step manner. Rather than inspecting failures individually, this approach enables systematic and scalable probing of model behavior by identifying the specific reasoning skill compositions they lack. Treating the LLM as a black box, our experiments on six models of varying sizes reveal multiple failure points in three reasoning benchmarks and highlight each model's unique and distinct skill gaps.
Problem

Research questions and friction points this paper is trying to address.

compositional skill gaps
large language models
benchmarking
reasoning skills
scaffolded task design
Innovation

Methods, ideas, or system contributions that make the work stand out.

Scaffolded Task Design
compositional reasoning
skill gap analysis
controlled task variation
black-box probing
πŸ”Ž Similar Papers
No similar papers found.