🤖 AI Summary
This work addresses the challenge of achieving both high accuracy and strong interpretability in tabular prediction models for high-stakes domains such as healthcare and finance. While symbolic models often lack expressive power, large language models (LLMs) frequently suffer from inconsistent reasoning and hallucinations. To bridge this gap, the authors propose the ReSS framework, which first employs decision trees to generate instance-level symbolic scaffolds that guide LLMs toward logically consistent and data-faithful natural language rationales. These rationales are then used to construct a high-quality training set for fine-tuning a specialized tabular reasoning model, augmented with a scaffold-invariant data augmentation strategy to enhance generalization. Experiments demonstrate that ReSS improves performance by up to 10% over conventional decision trees and standard fine-tuning baselines, substantially reduces hallucination rates, and yields explanations that are both more necessary and sufficient.
📝 Abstract
Tabular data remains prevalent in high-stakes domains such as healthcare and finance, where predictive models are expected to provide both high accuracy and faithful, human-understandable reasoning. While symbolic models offer verifiable logic, they lack semantic expressiveness. Meanwhile, general-purpose LLMs often require specialized fine-tuning to master domain-specific tabular reasoning. To address the dual challenges of scalable data curation and reasoning consistency, we propose ReSS, a systematic framework that bridges symbolic and neural reasoning models. ReSS leverages a decision-tree model to extract instance-level decision paths as symbolic scaffolds. These scaffolds, alongside input features and labels, guide an LLM to generate grounded natural-language reasoning that strictly adheres to the underlying decision logic. The resulting high-quality dataset is used to fine-tune a pretrained LLM into a specialized tabular reasoning model, further enhanced by a scaffold-invariant data augmentation strategy to improve generalization and explainability. To rigorously assess faithfulness, we introduce quantitative metrics including hallucination rate, explanation necessity, and explanation sufficiency. Experimental results on medical and financial benchmarks demonstrate that ReSS-trained models improve traditional decision trees and standard fine-tuning approaches up to $10\%$ while producing faithful and consistent reasoning