🤖 AI Summary
This paper addresses the fundamental trade-off among inference budget, factual accuracy, and reasoning capability in large language model (LLM) serving—formalizing this tripartite conflict for the first time and rigorously proving its inevitability via the BAR Theorem, which establishes a theoretical impossibility of simultaneously optimizing all three objectives. Methodologically, we propose a multi-objective constrained optimization framework, incorporating feasibility analysis and formal proofs to characterize Pareto-optimal trade-off curves under varying budget constraints. Our contributions are threefold: (1) deriving the theoretical lower bound for joint optimization of the three attributes; (2) providing principled design guidelines enabling application-aware, rational trade-offs; and (3) empirically validating consistent trade-off patterns across diverse model configurations, thereby laying a foundational theory for trustworthy LLM deployment.
📝 Abstract
When designing LLM services, practitioners care about three key properties: inference-time budget, factual authenticity, and reasoning capacity. However, our analysis shows that no model can simultaneously optimize for all three. We formally prove this trade-off and propose a principled framework named The BAR Theorem for LLM-application design.