Anytime-Valid Federated Conformal RAG for LLM Swarms

📅 2026-05-27

📈 Citations: 0

✨ Influential: 0

career value

194K/year

🤖 AI Summary

This work addresses the limitations of existing federated conformal retrieval-augmented generation (RAG) methods, which only provide coverage guarantees over fixed time horizons and struggle to support dynamic stopping or adaptive control in bandwidth-constrained settings with weak models. To overcome this, we propose Anytime-FC-RAG, the first framework enabling valid sequential coverage at arbitrary stopping times. Our approach leverages an additive per-step calibration error budget and a truncated betting e-process, accommodating adaptive operations—such as model updates and bandwidth optimization—without requiring additional assumptions. The method builds upon conditional coverage bounds, non-negative supermartingale e-processes, Ville’s inequality, Hoeffding stitching envelopes, and a novel federated probe-logit distillation (FPLD) mechanism. Experiments on GPT-2-small and MiniLM clusters demonstrate precise control over alert rates and detection latency, reduce communication overhead by 14%–57%, and ensure alerts are triggered exclusively upon actual coverage violations.

📝 Abstract

Federated Conformal RAG (FC-RAG) provides distribution-free coverage for a bandwidth-limited swarm of weak language models, but only at a fixed horizon. We extend it to anytime-valid sequential coverage: validity at every stopping time, preserved under predictable adaptive control (recalibration, per-node bandwidth escalation, distilled-student refresh), at no extra cost in assumptions over fixed-horizon FC-RAG. Naive composition fails because FC-RAG's marginal coverage bound makes the betting e-process a non-supermartingale on adverse calibration draws, and Ville's inequality cannot be invoked. We give Anytime-FC-RAG, a sequential extension built on a summable per-step calibration-deviation budget that converts the marginal bound into a strict conditional bound on a calibration-good event, paired with a truncated betting e-process that is a nonnegative supermartingale on the entire probability space. From these two ingredients, we obtain four guarantees: time-uniform alarm validity $\mathbb{P}(\sup_t E_t \ge 1/δ_e) \le δ_e + δ_{\mathrm{cal}}$, a Hoeffding-stitched cumulative-miscoverage envelope at the same total budget, safety under any predictable controller (recalibration, bandwidth escalation, student refresh), and training-side error propagation across an unbounded sequence of Federated Probe-Logit Distillation (FPLD) refreshes via a summable training budget. As a practical consequence, an adaptive controller that escalates retrieval bandwidth only when the e-process crosses a warning threshold matches the alarm rate of a fixed-high-bandwidth schedule at substantially lower communication cost. Experiments on a GPT-2-small + MiniLM swarm across MMLU, DBpedia, and AG News verify the predicted alarm rate, detection delay, envelope coverage, and $14$-$57\%$ bandwidth savings; the alarm fires when and only when coverage genuinely breaks.

Problem

Research questions and friction points this paper is trying to address.

anytime-valid

federated conformal inference

sequential coverage

adaptive control

language model swarms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Anytime-valid inference

Federated Conformal RAG

Sequential e-process