NLI:Non-uniform Linear Interpolation Approximation of Nonlinear Operations for Efficient LLMs Inference

📅 2026-02-03

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the computational inefficiency of nonlinear operations—such as SiLU, RMSNorm, and Softmax—in large language models, which rely heavily on high-precision floating-point arithmetic. To overcome this limitation, the authors propose a calibration-free, hardware-friendly Non-uniform Linear Interpolation (NLI) framework that models breakpoint selection as a dynamic programming problem. By leveraging Bellman’s principle of optimality, NLI achieves globally optimal approximations with significantly reduced approximation error. The framework is plug-and-play, enabling seamless integration into diverse nonlinear functions without architectural modifications. Experimental results demonstrate that the proposed NLI engine improves computational efficiency by over 4× compared to state-of-the-art approaches, while incurring negligible accuracy degradation.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) have demonstrated remarkable performance across a wide range of tasks, but their deployment is often constrained by substantial memory footprints and computational costs. While prior work has achieved significant progress in compressing and accelerating linear layers, nonlinear layers-such as SiLU, RMSNorm, and Softmax-still heavily depend on high-precision floating-point operations. In this paper, we propose a calibration-free, dynamic-programming-optimal, and hardware-friendly framework called Non-uniform Linear Interpolation (NLI). NLI is capable of efficiently approximating a variety of nonlinear functions, enabling seamless integration into LLMs and other deep neural networks with almost no loss in accuracy. NLI ingeniously recasts cutpoint selection as a dynamic-programming problem, achieving the globally minimal interpolation error in O(MxN2) time via Bellman's optimality principle. Based on the NLI algorithm, we also design and implement a plug-and-play universal nonlinear computation unit. Hardware experiments demonstrate that the NLI Engine achieves more than 4x improvement in computational efficiency compared to the state-of-the-art designs.

Problem

Research questions and friction points this paper is trying to address.

nonlinear operations

efficient LLMs inference

floating-point operations

computational cost

memory footprint

Innovation

Methods, ideas, or system contributions that make the work stand out.

Non-uniform Linear Interpolation

Dynamic Programming

Nonlinear Approximation