🤖 AI Summary
Medical prediction models frequently generate physiologically implausible outputs, compromising clinical reliability.
Method: This paper proposes a Rule-driven Reinforcement Learning Layer (RRLL) that operates as an inference-time correction module—integrated post-hoc without modifying the original model. Physiologically infeasible state transitions are encoded as lightweight, domain-specific rules; a compact state-action space is constructed, and a policy network is trained via reinforcement learning to perform real-time output rectification. No expert modeling or model retraining is required.
Contribution/Results: RRLL introduces the first decoupled correction paradigm combining rule guidance with reinforcement learning, achieving high generality, interpretability, and deployment efficiency. Experiments across multiple medical classification tasks demonstrate substantial reductions in physiological violation rates and consistent improvements in prediction accuracy. Crucially, cross-task transfer is enabled simply by substituting the domain-specific infeasibility rule set, underscoring its practical adaptability.
📝 Abstract
This paper adds to the growing literature of reinforcement learning (RL) for healthcare by proposing a novel paradigm: augmenting any predictor with Rule-based RL Layer (RRLL) that corrects the model's physiologically impossible predictions. Specifically, RRLL takes as input states predicted labels and outputs corrected labels as actions. The reward of the state-action pair is evaluated by a set of general rules. RRLL is efficient, general and lightweight: it does not require heavy expert knowledge like prior work but only a set of impossible transitions. This set is much smaller than all possible transitions; yet it can effectively reduce physiologically impossible mistakes made by the state-of-the-art predictor models. We verify the utility of RRLL on a variety of important healthcare classification problems and observe significant improvements using the same setup, with only the domain-specific set of impossibility changed. In-depth analysis shows that RRLL indeed improves accuracy by effectively reducing the presence of physiologically impossible predictions.