🤖 AI Summary
This work proposes a consciousness-inspired guidance mechanism grounded in Integrated Information Theory (IIT) to advance artificial general intelligence. By translating core IIT principles into optimizable reward signals, the method employs reinforcement learning to post-train large language models without relying on external data or auxiliary models. This approach enhances the causal structure, coherence, and information integration of generated text. Empirical results demonstrate up to a 31% reduction in output length on out-of-domain tasks while maintaining baseline-level accuracy, alongside significant improvements in model confidence calibration and reasoning efficiency. To the best of our knowledge, this study presents the first end-to-end optimization framework that operationalizes IIT within large language models.
📝 Abstract
The pursuit of Artificial General Intelligence (AGI) is a central goal in language model development, in which consciousness-like processing could serve as a key facilitator. While current language models are not conscious, they exhibit behaviors analogous to certain aspects of consciousness. This paper investigates the implementation of a leading theory of consciousness, Integrated Information Theory (IIT), within language models via a reward-based learning paradigm. IIT provides a formal, axiom-based mathematical framework for quantifying consciousness. Drawing inspiration from its core principles, we formulate a novel reward function that quantifies a text's causality, coherence and integration, characteristics associated with conscious processing. Empirically, it is found that optimizing for this IIT-inspired reward leads to more concise text generation. On out of domain tasks, careful tuning achieves up to a 31% reduction in output length while preserving accuracy levels comparable to the base model. In addition to primary task performance, the broader effects of this training methodology on the model's confidence calibration and test-time computational scaling is analyzed. The proposed framework offers significant practical advantages: it is conceptually simple, computationally efficient, requires no external data or auxiliary models, and leverages a general, capability-driven signal rather than task-specific heuristics. Code available at https://github.com/MH-Sameti/LLM_PostTraining.git