Design and Evaluation of Cost-Aware PoQ for Decentralized LLM Inference

📅 2025-12-18

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Existing Proof-of-Quality (PoQ) mechanisms for decentralized large language model (LLM) inference overlook heterogeneous node computation costs, leading to inefficient verification and misaligned incentives. Method: We propose a cost-aware PoQ mechanism featuring a linearly normalized reward function that explicitly incorporates an efficiency metric—quality-to-latency ratio—and a unified evaluation pipeline integrating token-level F1 ground truth, lightweight learned evaluators, and GPT-based judgment. Contribution/Results: We empirically demonstrate that semantic text similarity with dual-encoder architecture achieves significantly higher correlation with ground-truth and GPT scores than cross-encoders; further, LLMs exhibit superior inference cost-efficiency. Monte Carlo simulations of PoQ rounds validate the mechanism’s effectiveness in incentivizing high-quality, low-latency inference nodes and efficient evaluators, while discouraging low-quality, high-latency behavior.

Technology Category

Application Category

📝 Abstract

Decentralized large language model (LLM) inference promises transparent and censorship resistant access to advanced AI, yet existing verification approaches struggle to scale to modern models. Proof of Quality (PoQ) replaces cryptographic verification of computation with consensus over output quality, but the original formulation ignores heterogeneous computational costs across inference and evaluator nodes. This paper introduces a cost-aware PoQ framework that integrates explicit efficiency measurements into the reward mechanism for both types of nodes. The design combines ground truth token level F1, lightweight learned evaluators, and GPT based judgments within a unified evaluation pipeline, and adopts a linear reward function that balances normalized quality and cost. Experiments on extractive question answering and abstractive summarization use five instruction tuned LLMs ranging from TinyLlama-1.1B to Llama-3.2-3B and three evaluation models spanning cross encoder and bi encoder architectures. Results show that a semantic textual similarity bi encoder achieves much higher correlation with both ground truth and GPT scores than cross encoders, indicating that evaluator architecture is a critical design choice for PoQ. Quality-cost analysis further reveals that the largest models in the pool are also the most efficient in terms of quality per unit latency. Monte Carlo simulations over 5,000 PoQ rounds demonstrate that the cost-aware reward scheme consistently assigns higher average rewards to high quality low cost inference models and to efficient evaluators, while penalizing slow low quality nodes. These findings suggest that cost-aware PoQ provides a practical foundation for economically sustainable decentralized LLM inference.

Problem

Research questions and friction points this paper is trying to address.

Addresses scalability issues in verifying decentralized LLM inference.

Integrates computational cost into Proof of Quality reward mechanisms.

Balances output quality and cost for sustainable decentralized AI.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cost-aware PoQ integrates efficiency into reward mechanisms

Unified evaluation pipeline combines F1, learned evaluators, GPT judgments

Linear reward function balances normalized quality and computational cost

🔎 Similar Papers

No similar papers found.