🤖 AI Summary
This paper introduces the novel problem of Large-Scale Constraint Generation (LSCG), designed to evaluate large language models’ (LLMs) ability to parse and conditionally generate text under hundreds of fine-grained, general-purpose constraints. To this end, we construct the Words Checker benchmark and conduct the first systematic study of LLM performance degradation as constraint scale increases. To address critical issues of constraint redundancy and interference, we propose FoCusNet—a model incorporating dynamic attention to selectively attend to high-impact constraints. We further integrate and compare diverse prompting strategies—including Simple Prompt, Chain-of-Thought, and Best-of-N—across model scales and architectures. Experimental results demonstrate that existing methods suffer substantial accuracy drops with increasing constraint count; in contrast, FoCusNet improves average accuracy by 8–13% across multi-scale constraint settings, validating both the effectiveness and generalizability of the constraint-focusing mechanism.
📝 Abstract
Recent research has explored the constrained generation capabilities of Large Language Models (LLMs) when explicitly prompted by few task-specific requirements. In contrast, we introduce Large-Scale Constraint Generation (LSCG), a new problem that evaluates whether LLMs can parse a large, fine-grained, generic list of constraints. To examine the LLMs' ability to handle an increasing number constraints, we create a practical instance of LSCG, called Words Checker. In Words Checker, we evaluate the impact of model characteristics (e.g., size, family) and steering techniques (e.g., Simple Prompt, Chain of Thought, Best of N) on performance. We also propose FoCusNet, a small and dedicated model that parses the original list of constraints into a smaller subset, helping the LLM focus on relevant constraints. Experiments reveal that existing solutions suffer a significant performance drop as the number of constraints increases, with FoCusNet showing an 8-13% accuracy boost.