🤖 AI Summary
This work addresses the fundamental trade-off between information fidelity and compression ratio in text summarization. We systematically introduce rate-distortion theory to summarization for the first time, establishing the first information-theoretic rate-distortion framework for this task. We formally define the rate-distortion function of a summarizer, characterizing its intrinsic performance lower bound; propose a computable theoretical lower bound estimator and a Blahut–Arimoto-type iterative optimization algorithm; and design a practical estimator tailored for low-resource settings. Empirical evaluation demonstrates that the derived theoretical lower bound strongly correlates with the actual performance of state-of-the-art summarization models—including BART, PEGASUS, and LLaMA-3—significantly outperforming conventional metrics (e.g., ROUGE, BERTScore) in correlation strength. This work provides an information-theoretic foundation for summary quality assessment and offers interpretable, principled optimization criteria for model design and compression.
📝 Abstract
This paper introduces an information-theoretic framework for text summarization. We define the summarizer rate-distortion function and show that it provides a fundamental lower bound on summarizer performance. We describe an iterative procedure, similar to Blahut-Arimoto algorithm, for computing this function. To handle real-world text datasets, we also propose a practical method that can calculate the summarizer rate-distortion function with limited data. Finally, we empirically confirm our theoretical results by comparing the summarizer rate-distortion function with the performances of different summarizers used in practice.