LZ78 Substring Compression in Compressed Space

📅 2025-12-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This paper addresses the problem of computing the LZ78 factorization of an arbitrary substring in compressed space, in real time. Conventional approaches require full decompression and incur prohibitive time and space overheads. To overcome this, we propose the first compressed indexing framework supporting substring LZ78 factorization—integrating a lightweight dynamic dictionary synchronization mechanism, a tailored variant of the suffix array, and compact dictionary encoding. Our algorithm factorizes a substring of length $n$ containing $z$ LZ78 factors in $O(z log n)$ time using only $O(z log n)$ bits of space—matching the theoretical lower bound up to a single logarithmic factor. This work achieves, for the first time, efficient and indexable substring LZ78 factorization directly in compressed space, significantly outperforming naive decompress-then-factorize methods. It establishes a foundational advance at the intersection of text indexing and compressed computation.

Technology Category

Application Category

📝 Abstract
The Lempel--Ziv 78 (LZ78) factorization is a well-studied technique for data compression. It and its derivatives are used in compression formats such as "compress" or "gif". Although most research focuses on the factorization of plain data, not much research has been conducted on indexing the data for fast LZ78 factorization. Here, we study the LZ78 factorization and its derivatives in the substring compression model, where we are allowed to index the data and return the factorization of a substring specified at query time. In that model, we propose an algorithm that works in compressed space, computing the factorization with a logarithmic slowdown compared to the optimal time complexity.
Problem

Research questions and friction points this paper is trying to address.

Develops substring compression for LZ78 in compressed space
Enables indexing data for fast LZ78 factorization queries
Achieves logarithmic slowdown compared to optimal time complexity
Innovation

Methods, ideas, or system contributions that make the work stand out.

Compressed space algorithm for LZ78 factorization
Substring compression model with indexing capability
Logarithmic slowdown compared to optimal time complexity
🔎 Similar Papers
No similar papers found.
H
Hiroki Shibata
Joint Graduate School of Mathematics for Innovation, Kyushu University
Dominik Köppl
Dominik Köppl
Faculty for Engineering, University of Yamanashi
stringologyalgorithms and data structurescombinatorics on words