🤖 AI Summary
This work addresses the lack of systematicity and convergence guarantees in existing self-correction methods for large language models, which often rely on generic prompts. We propose a closed-loop self-correction framework grounded in control theory, featuring a tri-modal error detector that integrates self-consistency checks, linguistic confidence estimation, and chain-of-thought validation to identify errors. A type-aware correction controller then generates targeted repair instructions, while a convergence judge—designed using stability criteria—ensures iterative refinement terminates appropriately. Evaluated on our newly constructed CyberCorrect-Bench, the method achieves 79.8% accuracy, outperforming the best baseline by 6.2 percentage points and reducing overshoot rate by 41%. Additionally, we introduce three dynamic evaluation metrics—convergence rate, overshoot rate, and oscillation rate—to provide theoretical grounding for iterative error correction.
📝 Abstract
Large language model (LLM) self-correction -- the ability to detect and fix errors in generated outputs -- remains largely ad hoc, relying on generic prompts such as "please reconsider your answer" without systematic error analysis or convergence guarantees. We propose CyberCorrect, a framework that formalizes LLM self-correction as a closed-loop control system grounded in cybernetic theory. The framework models the LLM generator as the plant and introduces a tri-modal Error Detector (combining self-consistency, verbalized confidence, and logic-chain verification) as the sensor. A type-directed Correction Controller generates targeted repair instructions based on diagnosed error categories, while a Convergence Judge determines iteration termination using stability criteria adapted from control theory. We further introduce three control-theoretic evaluation metrics -- convergence rate, overshoot rate, and oscillation rate -- that capture correction dynamics beyond final accuracy. Experiments on our constructed CyberCorrect-Bench (440 reasoning tasks with annotated error types and correction paths) show that CyberCorrect achieves 79.8% final accuracy, improving upon the best existing self-correction method by 6.2 percentage points, while reducing overshoot (erroneous over-correction) by 41% through its convergence control mechanism.