🤖 AI Summary
Existing large language models exhibit weak type inference capabilities in code generation, compromising the type correctness of synthesized programs. To address this, we propose TyFlow—a type-guided program synthesis framework that establishes, for the first time, structural isomorphism between type derivation trees and synthesis derivation trees, thereby internalizing type system complexity into the representation. TyFlow replaces conventional token-level autoregressive generation with synthesis decision sequences, enabling the model to focus on high-level semantic and type constraints. Our approach achieves complete elimination of type errors and significantly improves functional correctness across multiple benchmarks, demonstrating the efficacy of deep synergy between formal type systems and neural language models. The core innovations lie in (1) type-synthesis structural isomorphism modeling and (2) a flow-based, type-aware generation paradigm.
📝 Abstract
Language models have shown remarkable proficiency in code generation; nevertheless, ensuring type correctness remains a challenge. Although traditional methods, such as constrained decoding, alleviate this problem by externally rejecting untypable code, the model itself does not effectively learn type reasoning internally, which ultimately limits its overall performance. This paper introduces TyFlow, a novel system that internalizes type reasoning within code generation to guide the model to learn the type system. The core of our approach is a novel type-guided program synthesis system that maintains an isomorphism between type derivation trees and synthesis derivation trees, enabling a new code representation based on synthesis decision sequences rather than traditional text-based token sequences. By offloading the complexity of type system learning to the representation itself, models can redirect their computational resources toward higher-level program semantics. Our evaluation shows that TyFlow not only eliminates type errors but also significantly improves functional correctness, highlighting the importance of aligning LMs with type systems internally.