Tokenizing Semantic Segmentation with RLE

📅 2026-02-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a unified autoregressive segmentation framework based on run-length encoding (RLE) to address the lack of a common architecture for image and video semantic segmentation and the inefficiency of long-sequence generation. By discretizing segmentation masks into RLE token sequences and leveraging an enhanced Pix2Seq architecture for autoregressive generation, the method introduces a novel token compression strategy that substantially reduces sequence length. The approach naturally supports temporal modeling in video segmentation and can be extended to panoptic segmentation by incorporating instance-level information. Experimental results on two benchmark datasets demonstrate performance comparable to state-of-the-art methods, confirming the effectiveness and versatility of the proposed framework.

Technology Category

Application Category

📝 Abstract
This paper presents a new unified approach to semantic segmentation in both images and videos by using language modeling to output the masks as sequences of discrete tokens. We use run length encoding (RLE) to discretize the segmentation masks and then train a modified version of Pix2Seq \cite{p2s} to output these RLE tokens through autoregression. We propose novel tokenization strategies to compress the length of the token sequence to make it practicable to extend this approach to videos. We also show how instance information can be incorporated into the tokenization process to perform panoptic segmentation. We evaluate our proposed models on two datasets to show that they are competitive with the state of the art in spite of being bottlenecked by our limited computational resources.
Problem

Research questions and friction points this paper is trying to address.

semantic segmentation
video segmentation
panoptic segmentation
tokenization
run length encoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

Run Length Encoding
Language Modeling
Tokenized Segmentation
Autoregressive Generation
Panoptic Segmentation
🔎 Similar Papers
No similar papers found.
A
Abhineet Singh
Department of Computing Science, University of Alberta
J
Justin Rozeboom
Department of Computing Science, University of Alberta
Nilanjan Ray
Nilanjan Ray
Professor, Department of Computing Science, University of Alberta
deep learningcomputer visionimage and video analysismedical imaging