Sci-CoE: Co-evolving Scientific Reasoning LLMs via Geometric Consensus with Sparse Supervision

📅 2026-02-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the fragility of large language models in scientific reasoning tasks, which stems from unreliable evaluation and overly simplistic verification strategies. To overcome this limitation, the authors propose a two-stage co-evolutionary framework: first, a base verifier is constructed using a small amount of annotated data; subsequently, on unlabeled data, a geometric reward mechanism drives the joint self-evolution of both solver and verifier. This reward mechanism integrates consensus, reliability, and diversity to enable efficient co-training that transitions smoothly from sparse supervision to fully unsupervised learning. Experimental results demonstrate that the approach significantly enhances complex reasoning performance across multiple scientific reasoning benchmarks, exhibits strong scalability, and establishes a more robust and diverse evaluation paradigm.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated exceptional reasoning capabilities, and co-evolving paradigms have shown promising results in domains such as code and math. However, in scientific reasoning tasks, these models remain fragile due to unreliable solution evaluation and limited diversity in verification strategies. In this work, we propose Sci-CoE, a two-stage scientific co-evolving framework that enables models to self-evolve as both solver and verifier through a transition from sparse supervision to unsupervised learning. In the first stage, the model uses a small set of annotated data to establish fundamental correctness judgment anchors for the Verifier. In the second stage, we introduce a geometric reward mechanism that jointly considers consensus, reliability, and diversity, driving large-scale self-iteration on unlabeled data. Experiments on several general scientific benchmarks demonstrate that Sci-CoE enhances complex reasoning capabilities and exhibits strong scalability, facilitating the construction of more robust and diverse evaluation systems. Codes are available at https://github.com/InternScience/Sci-CoE.
Problem

Research questions and friction points this paper is trying to address.

scientific reasoning
large language models
solution evaluation
verification diversity
model robustness
Innovation

Methods, ideas, or system contributions that make the work stand out.

co-evolving
geometric consensus
sparse supervision
scientific reasoning
self-iteration
🔎 Similar Papers
No similar papers found.
X
Xiaohan He
Shanghai Artificial Intelligence Laboratory
Shiyang Feng
Shiyang Feng
Researcher
AI for Science
S
Songtao Huang
Shanghai Artificial Intelligence Laboratory, Fudan University
Lei Bai
Lei Bai
Shanghai AI Laboratory
Foundation ModelScience IntelligenceMulti-Agent SystemAutonomous Discovery
Bin Wang
Bin Wang
Fudan University
B
Bo Zhang
Shanghai Artificial Intelligence Laboratory