🤖 AI Summary
This work addresses the inefficiency of large language models in codebase tasks, where excessive retrieval of irrelevant code wastes tokens and existing single-objective pruning methods struggle to capture diverse relevance signals. To overcome this, the authors propose LaMR, a novel framework that decouples code relevance into two interpretable dimensions—semantic evidence and dependency support—each modeled by a dedicated conditional random field (CRF). A mixture-of-experts gating network dynamically fuses these signals to produce context-aware pruning decisions. Leveraging abstract syntax tree (AST) analysis, LaMR automatically generates multi-criterion labels, enabling effective context denoising and precise pruning without additional annotations. Experimental results across four benchmarks show that LaMR wins 12 out of 16 head-to-head comparisons, reduces token usage by up to 31% in multi-turn settings, and improves Exact Match scores by as much as 3.5 points over unpruned baselines.
📝 Abstract
LLM-powered coding agents spend the majority of their token budget reading repository files, yet much of the retrieved code is irrelevant to the task at hand. Existing learned pruners compress this context with a single-objective sequence labeler, collapsing all facets of code relevance into one score and one transition matrix. We show that this formulation creates a modeling bottleneck: a single CRF transition prior must serve heterogeneous retention patterns, including contiguous semantic spans and sparse structural support lines. We propose LaMR (Latent Multi-Rubric), a structured pruning framework that decomposes code relevance into two interpretable quality dimensions, semantic evidence and dependency support, each modeled by a dedicated CRF with dimension-specific transition dynamics. A mixture-of-experts gating network dynamically weights the per-rubric emissions conditioned on the query, and a final CRF layer on the fused emissions produces the aggregate keep-or-prune decision. To supervise each dimension without additional annotation cost, we derive multi-rubric labels from the existing training corpus via AST-based program analysis, simultaneously denoising the teacher's binary labels. By effectively filtering distracting noise, LaMR frequently matches or even outperforms unpruned full-context baselines. Experiments on four benchmarks (SWE-Bench Verified, SWE-QA, LCC, LongCodeQA) show that LaMR wins 12 of 16 head-to-head multi-turn comparisons. It saves up to 31% more tokens on multi-turn agent tasks and improves Exact Match by up to +3.5 on single-turn tasks, while performance is frequently enhanced by denoising the context, and any remaining drops are marginal.