🤖 AI Summary
Language models frequently generate syntactically valid but semantically incorrect code; existing approaches offer only shallow syntactic constraints or brittle semantic encodings. Method: We propose the first systematic semantic-constrained decoding framework that models deep program properties—such as type safety and functional equivalence—as coinductive realizability problems, uniformly solved over regular codata. By jointly leveraging abstract program structure analysis and token-level constrained decoding, the framework tightly integrates large language model outputs with formal verification. Contribution/Results: This work establishes semantic-constrained decoding as a principled, programmable extension of language models—marking the first such formulation. Evaluated across diverse code generation tasks, it achieves substantial improvements in functional correctness while preserving practical decoding efficiency.
📝 Abstract
Language models (LMs) can generate code, but cannot guarantee its correctness--producing outputs that often violate type safety, program invariants, or semantic equivalence. Constrained decoding offers a solution by restricting generation to programs that satisfy desired properties. Yet, existing methods are limited to shallow syntactic constraints or rely on brittle, ad hoc encodings of semantics over token sequences.
We present ChopChop, the first programmable framework for semantic constrained decoding, enabling LMs to generate code that provably satisfies rich semantic properties. ChopChop connects token-level generation with reasoning over abstract program structures using a coinduction-based formalism and reduces constraint enforcement to a realizability problem over regular codata. We demonstrate ChopChop's generality through generation constrained by type safety and program equivalence, showing how formal methods can be seamlessly integrated into LM-driven code generation. ChopChop transforms semantic constrained decoding from a niche technique into a systematic, principled extension of LMs--improving success rates across models and tasks while maintaining practical decoding latency.