🤖 AI Summary
Non-autoregressive models struggle with variable-length sequence generation due to their reliance on fixed token structures and rigid length constraints. To address this, we propose a discrete flow matching framework grounded in elementary edit operations—insertion, deletion, and substitution—formulating a continuous-time Markov chain over the sequence space. This is the first work to model editing dynamics as a continuous-time flow of sequence states. By augmenting the state space with auxiliary variables and incorporating structural awareness and relative positional encoding, our approach enables flexible, structure-aware generation. The method overcomes fundamental bottlenecks of non-autoregressive models in length variability and structural alignment. Empirically, it outperforms both autoregressive and mask-based models on image captioning, and achieves significant gains over conventional masking strategies in text and code generation tasks.
📝 Abstract
Autoregressive generative models naturally generate variable-length sequences, while non-autoregressive models struggle, often imposing rigid, token-wise structures. We propose Edit Flows, a non-autoregressive model that overcomes these limitations by defining a discrete flow over sequences through edit operations-insertions, deletions, and substitutions. By modeling these operations within a Continuous-time Markov Chain over the sequence space, Edit Flows enable flexible, position-relative generation that aligns more closely with the structure of sequence data. Our training method leverages an expanded state space with auxiliary variables, making the learning process efficient and tractable. Empirical results show that Edit Flows outperforms both autoregressive and mask models on image captioning and significantly outperforms the mask construction in text and code generation.