An Additive Approximation Scheme for Generating Dyadic Codings for the Outputs of an LLM

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

221K/year

🤖 AI Summary

This work addresses the problem of approximating the discrete output distribution of large language models with a dyadic distribution induced by a binary tree, under a fixed coding rate constraint, by minimizing the total variation distance. The authors formulate this as a tree-based partitioning optimization problem and propose the first polynomial-time additive approximation algorithm with theoretical guarantees, achieving near-optimal dyadic coding at constant code rates. By jointly optimizing coding efficiency and statistical fidelity, the method substantially reduces the total variation distance, thereby enabling an efficient, provably secure, and low-detectability mechanism for per-token steganography.

📝 Abstract

We study the problem of approximating a discrete probability distribution, such as the next-token distribution of a large language model, by a dyadic distribution induced by a binary tree under encoding rate constraints. The objective is to partition the support of the distribution and assign dyadic probabilities to minimize total variation distance while achieving a prescribed rate. We formulate this task as a tree-based partitioning problem and develop a polynomial-time additive approximation scheme for the rate-constrained setting in the constant-rate regime. Our results provide provable guarantees for near-optimal dyadic approximations and, as an application, yield a principled framework for LLM-based steganography, where the rate maps to bits of hidden information embedded per token and the total variation bound controls statistical detectability.

Problem

Research questions and friction points this paper is trying to address.

dyadic coding

large language models

rate-constrained approximation

total variation distance

discrete probability distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

dyadic coding

additive approximation

rate-constrained optimization