Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction

📅 2026-02-09

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

This work addresses the challenge of reconstructing structured vector representations from rasterized floorplans—a crucial step toward enabling CAD automation and semantic understanding, particularly for layouts containing complex polygons and numerous rooms. The task is formulated as a sequence-to-sequence generation problem, where architectural elements such as rooms, doors, and windows are represented as sequences of polygon vertices annotated with semantic labels. To enhance geometric fidelity and contextual awareness, the authors introduce a learnable spatial anchor-based attention mechanism that guides the autoregressive decoder to focus on relevant image regions during decoding. The proposed method achieves state-of-the-art performance on established benchmarks including Structure3D, CubiCasa5K, and Raster2Graph, and demonstrates strong generalization capabilities on the more challenging WAFFLE dataset.

Technology Category

Application Category

📝 Abstract

Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.

Problem

Research questions and friction points this paper is trying to address.

floorplan reconstruction

vector graphics

polygon sequence

raster-to-vector

structured representation

Innovation

Methods, ideas, or system contributions that make the work stand out.

sequence-to-sequence

autoregressive decoding

learnable anchors