INT-DTT+: Low-Complexity Data-Dependent Transforms for Video Coding

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
In video coding, traditional data-dependent transforms (e.g., KLT) offer superior energy compaction but incur high computational complexity, whereas fixed transforms (e.g., DCT-2) are efficient yet lack adaptivity. This paper proposes DTT+, a low-complexity, data-driven separable transform framework. DTT+ jointly learns row- and column-wise graphs and performs sparse Cauchy matrix decomposition to construct structured graph-based transforms, further enabling hardware-friendly integer approximations. Its core contribution lies in the first integration of separable graph-based modeling with lightweight integer implementation—achieving significantly improved energy concentration while maintaining computational and memory overhead comparable to DCT-2. Integrated into the Versatile Video Coding (VVC) standard, DTT+ achieves over 3% BD-rate reduction versus the MTS baseline, outperforms separable KLT in rate-distortion performance, and exhibits complexity far lower than KLT yet close to integer DCT-2.

Technology Category

Application Category

📝 Abstract
Discrete trigonometric transforms (DTTs), such as the DCT-2 and the DST-7, are widely used in video codecs for their balance between coding performance and computational efficiency. In contrast, data-dependent transforms, such as the Karhunen-Loève transform (KLT) and graph-based separable transforms (GBSTs), offer better energy compaction but lack symmetries that can be exploited to reduce computational complexity. This paper bridges this gap by introducing a general framework to design low-complexity data-dependent transforms. Our approach builds on DTT+, a family of GBSTs derived from rank-one updates of the DTT graphs, which can adapt to signal statistics while retaining a structure amenable to fast computation. We first propose a graph learning algorithm for DTT+ that estimates the rank-one updates for rows and column graphs jointly, capturing the statistical properties of the overall block. Then, we exploit the progressive structure of DTT+ to decompose the kernel into a base DTT and a structured Cauchy matrix. By leveraging low-complexity integer DTTs and sparsifying the Cauchy matrix, we construct an integer approximation to DTT+, termed INT-DTT+. This approximation significantly reduces both computational and memory complexities with respect to the separable KLT with minimal performance loss. We validate our approach in the context of mode-dependent transforms for the VVC standard, following a rate-distortion optimized transform (RDOT) design approach. Integrated into the explicit multiple transform selection (MTS) framework of VVC in a rate-distortion optimization setup, INT-DTT+ achieves more than 3% BD-rate savings over the VVC MTS baseline, with complexity comparable to the integer DCT-2 once the base DTT coefficients are available.
Problem

Research questions and friction points this paper is trying to address.

Designing low-complexity data-dependent transforms for video coding
Bridging performance gap between DTTs and data-dependent transforms
Reducing computational and memory complexity of separable KLT
Innovation

Methods, ideas, or system contributions that make the work stand out.

Low-complexity data-dependent transforms using rank-one graph updates
Integer approximation through DTT decomposition and Cauchy sparsification
Joint graph learning algorithm capturing block statistical properties
🔎 Similar Papers
No similar papers found.