AGDC: Autoregressive Generation of Variable-Length Sequences with Joint Discrete and Continuous Spaces

📅 2026-01-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses the limitations of traditional autoregressive models, which rely on discrete tokenization and struggle to accurately model continuous values—often leading to functional failures in precision-sensitive tasks such as semiconductor circuit design. To overcome this, the authors propose AGDC, a unified framework that enables end-to-end autoregressive generation of hybrid discrete-continuous sequences for the first time. Built upon a Transformer architecture, AGDC integrates classification-based prediction with diffusion modeling and introduces a dynamic EOS logit adjustment mechanism alongside a length regularization loss. Evaluated on a newly curated high-precision semiconductor layout dataset, ContLayNet (334K samples), and SVG-based graphics tasks, AGDC significantly outperforms both discretization-based and fixed-structure baselines, breaking through the precision bottleneck and enabling high-fidelity, variable-length vector data generation.

Technology Category

Application Category

📝 Abstract

Transformer-based autoregressive models excel in data generation but are inherently constrained by their reliance on discretized tokens, which limits their ability to represent continuous values with high precision. We analyze the scalability limitations of existing discretization-based approaches for generating hybrid discrete-continuous sequences, particularly in high-precision domains such as semiconductor circuit designs, where precision loss can lead to functional failure. To address the challenge, we propose AGDC, a novel unified framework that jointly models discrete and continuous values for variable-length sequences. AGDC employs a hybrid approach that combines categorical prediction for discrete values with diffusion-based modeling for continuous values, incorporating two key technical components: an end-of-sequence (EOS) logit adjustment mechanism that uses an MLP to dynamically adjust EOS token logits based on sequence context, and a length regularization term integrated into the loss function. Additionally, we present ContLayNet, a large-scale benchmark comprising 334K high-precision semiconductor layout samples with specialized evaluation metrics that capture functional correctness where precision errors significantly impact performance. Experiments on semiconductor layouts (ContLayNet), graphic layouts, and SVGs demonstrate AGDC's superior performance in generating high-fidelity hybrid vector representations compared to discretization-based and fixed-schema baselines, achieving scalable high-precision generation across diverse domains.

Problem

Research questions and friction points this paper is trying to address.

autoregressive generation

discrete-continuous sequences

high-precision generation

variable-length sequences

semiconductor layout

Innovation

Methods, ideas, or system contributions that make the work stand out.

autoregressive generation

hybrid discrete-continuous modeling

diffusion-based continuous modeling