CyberCorrect: A Cybernetic Framework for Closed-Loop Self-Correction in Large Language Models

📅 2026-05-17

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the lack of systematicity and convergence guarantees in existing self-correction methods for large language models, which often rely on generic prompts. We propose a closed-loop self-correction framework grounded in control theory, featuring a tri-modal error detector that integrates self-consistency checks, linguistic confidence estimation, and chain-of-thought validation to identify errors. A type-aware correction controller then generates targeted repair instructions, while a convergence judge—designed using stability criteria—ensures iterative refinement terminates appropriately. Evaluated on our newly constructed CyberCorrect-Bench, the method achieves 79.8% accuracy, outperforming the best baseline by 6.2 percentage points and reducing overshoot rate by 41%. Additionally, we introduce three dynamic evaluation metrics—convergence rate, overshoot rate, and oscillation rate—to provide theoretical grounding for iterative error correction.

📝 Abstract

Large language model (LLM) self-correction -- the ability to detect and fix errors in generated outputs -- remains largely ad hoc, relying on generic prompts such as "please reconsider your answer" without systematic error analysis or convergence guarantees. We propose CyberCorrect, a framework that formalizes LLM self-correction as a closed-loop control system grounded in cybernetic theory. The framework models the LLM generator as the plant and introduces a tri-modal Error Detector (combining self-consistency, verbalized confidence, and logic-chain verification) as the sensor. A type-directed Correction Controller generates targeted repair instructions based on diagnosed error categories, while a Convergence Judge determines iteration termination using stability criteria adapted from control theory. We further introduce three control-theoretic evaluation metrics -- convergence rate, overshoot rate, and oscillation rate -- that capture correction dynamics beyond final accuracy. Experiments on our constructed CyberCorrect-Bench (440 reasoning tasks with annotated error types and correction paths) show that CyberCorrect achieves 79.8% final accuracy, improving upon the best existing self-correction method by 6.2 percentage points, while reducing overshoot (erroneous over-correction) by 41% through its convergence control mechanism.

Problem

Research questions and friction points this paper is trying to address.

self-correction

large language models

error analysis

convergence guarantee

cybernetics

Innovation

Methods, ideas, or system contributions that make the work stand out.

closed-loop control

self-correction

error detection