AGDC: Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces

πŸ“… 2026-01-09
πŸ›οΈ arXiv.org
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work addresses the limitations of traditional autoregressive models, which rely on discrete tokenization and struggle to accurately model continuous valuesβ€”often leading to functional failures in precision-sensitive tasks such as semiconductor circuit design. To overcome this, the authors propose AGDC, a unified framework that enables end-to-end autoregressive generation of hybrid discrete-continuous sequences for the first time. Built upon a Transformer architecture, AGDC integrates classification-based prediction with diffusion modeling and introduces a dynamic EOS logit adjustment mechanism alongside a length regularization loss. Evaluated on a newly curated high-precision semiconductor layout dataset, ContLayNet (334K samples), and SVG-based graphics tasks, AGDC significantly outperforms both discretization-based and fixed-structure baselines, breaking through the precision bottleneck and enabling high-fidelity, variable-length vector data generation.

Technology Category

Application Category

πŸ“ Abstract
Transformer-based autoregressive models excel in data generation but are inherently constrained by their reliance on discretized tokens, which limits their ability to represent continuous values with high precision. We analyze the scalability limitations of existing discretization-based approaches for generating hybrid discrete-continuous sequences, particularly in high-precision domains such as semiconductor circuit designs, where precision loss can lead to functional failure. To address the challenge, we propose AGDC, a novel unified framework that jointly models discrete and continuous values for variable-length sequences. AGDC employs a hybrid approach that combines categorical prediction for discrete values with diffusion-based modeling for continuous values, incorporating two key technical components: an end-of-sequence (EOS) logit adjustment mechanism that uses an MLP to dynamically adjust EOS token logits based on sequence context, and a length regularization term integrated into the loss function. Additionally, we present ContLayNet, a large-scale benchmark comprising 334K high-precision semiconductor layout samples with specialized evaluation metrics that capture functional correctness where precision errors significantly impact performance. Experiments on semiconductor layouts (ContLayNet), graphic layouts, and SVGs demonstrate AGDC's superior performance in generating high-fidelity hybrid vector representations compared to discretization-based and fixed-schema baselines, achieving scalable high-precision generation across diverse domains.
Problem

Research questions and friction points this paper is trying to address.

autoregressive generation
discrete-continuous sequences
high-precision generation
variable-length sequences
semiconductor layout
Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive generation
hybrid discrete-continuous modeling
diffusion-based continuous modeling
variable-length sequence generation
length-aware EOS adjustment
πŸ”Ž Similar Papers
No similar papers found.
Y
Yeonsang Shin
Seoul National University
I
Insoo Kim
Seoul National University
B
Bongkeun Kim
Samsung Electronics
K
Keonwoo Bae
Samsung Electronics
Bohyung Han
Bohyung Han
Professor, Electrical and Computer Engineering, Seoul National University
Computer visionmachine learningdeep learning