🤖 AI Summary
This work addresses a critical limitation in current AI systems, which predominantly focus on task outputs or static knowledge while neglecting the continuous optimization of internal reasoning structures, action scheduling, and learning mechanisms. The paper proposes a human-inspired continual learning framework that explicitly models internal reasoning processes as learnable entities. By capturing structured reasoning trajectories, the framework simultaneously refines task performance and cognitive architecture during execution. It integrates sequential reasoning models with a parallel learning architecture, unifying modules for reasoning, action, reflection, and verification. A hierarchical meta-learning mechanism is introduced to jointly optimize task-specific parameters and learning strategies. Evaluated on a temperature sensor anomaly detection task, the approach reduces average runtime by 23.9% while maintaining system stability.
📝 Abstract
Learning internal reasoning processes is crucial for developing AI systems capable of sustained adaptation in dynamic real-world environments. However, most existing approaches primarily emphasize learning task-specific outputs or static knowledge representations, while overlooking the continuous refinement of internal reasoning structures, action scheduling policies, and learning mechanisms themselves. In this paper, we propose a human-inspired continuous learning framework that unifies reasoning, action, reflection, and verification within a sequential reasoning model enhanced by parallel learning. The framework explicitly treats internal thinking processes as primary learning objects. It systematically records internal reasoning trajectories and environmental interactions as structured learning material, enabling the system to optimize not only task-level content but also the organization, scheduling, and evolution of reasoning activities. This design realizes learning alongside processing, allowing cognitive structures to improve during execution. Furthermore, the framework supports controlled replacement of predefined logic with learned procedures and introduces a hierarchical learning-to-learn mechanism that jointly adapts task-level parameters and learning strategies. As a result, the system progressively evolves its internal cognitive architecture while preserving operational stability. Experimental results on a temperature sensor abnormality detection task show that incorporating internal-process learning reduces average runtime by 23.9%.