🤖 AI Summary
In homomorphic encryption (HE)-based large language model (LLM) text generation, accumulated ciphertext errors during autoregressive decoding frequently cause “generation collapse” and severely degrade output coherence. Method: This paper pioneers modeling token sequence reordering in autoregressive decoding as a Traveling Salesman Problem (TSP), proposing a path-optimization–driven reordering strategy to minimize error propagation in HE computations; it further introduces a lightweight, encryption-domain error-correction post-processing mechanism to enhance output stability. Contribution/Results: The approach requires no model architecture modification and preserves end-to-end privacy. Experiments demonstrate substantial improvements in logical coherence and reasoning robustness of generated text. Both theoretical analysis and empirical evaluation confirm its effectiveness in preventing generation collapse, establishing a novel paradigm for practical, privacy-preserving LLM inference.
📝 Abstract
As users increasingly interact with large language models (LLMs) using private information, secure and encrypted communication becomes essential. Homomorphic encryption (HE) provides a principled solution by enabling computation directly on encrypted data. Although prior work has explored aspects of running LLMs under HE, the challenge of text generation, particularly next-token prediction, has received limited attention and remains a key obstacle to practical encrypted interaction. In this work, we propose a TSP-based token reordering strategy to address the difficulties of encrypted text generation, together with a post-processing step that further reduces approximation error. Theoretical analysis and experimental results demonstrate that our method prevents collapse, improves coherence in generated text, and preserves data privacy throughout. Overall, our contributions advance the feasibility of practical and privacy-preserving LLM inference.