🤖 AI Summary
Molecular property prediction faces cold-start and data sparsity challenges in chemical engineering tasks such as solvent screening. To address this, we propose a fine-tuning-free language model framework integrating context engineering, empirical Bayesian inference, and retrieval-augmented generation to establish a reflexive prediction loop. We further introduce five-model cross-model self-consistency verification and an industry-oriented ranking task, revealing for the first time the task-adaptive self-consistency effect. Our method extracts reusable chemical rules from few-shot examples without parameter updates. In logP prediction for amine solvents, it achieves a 72% reduction in MAE and a 112% improvement in R² over baselines, while reducing deployment cost by over 70%. The approach significantly enhances generalization and practical utility in low-resource settings.
📝 Abstract
Molecular property prediction is fundamental to chemical engineering applications such as solvent screening. We present Socrates-Mol, a framework that transforms language models into empirical Bayesian reasoners through context engineering, addressing cold start problems without model fine-tuning. The system implements a reflective-prediction cycle where initial outputs serve as priors, retrieved molecular cases provide evidence, and refined predictions form posteriors, extracting reusable chemical rules from sparse data. We introduce ranking tasks aligned with industrial screening priorities and employ cross-model self-consistency across five language models to reduce variance. Experiments on amine solvent LogP prediction reveal task-dependent patterns: regression achieves 72% MAE reduction and 112% R-squared improvement through self-consistency, while ranking tasks show limited gains due to systematic multi-model biases. The framework reduces deployment costs by over 70% compared to full fine-tuning, providing a scalable solution for molecular property prediction while elucidating the task-adaptive nature of self-consistency mechanisms.