๐ค AI Summary
This study addresses the challenge that molecular property prediction models often exhibit high-risk local errors in regions characterized by "activity cliffs"โwhere structurally similar molecules display abrupt property changesโa phenomenon poorly captured by conventional global evaluation metrics. To tackle this, the work introduces CliffSplit, an evaluation protocol that constructs cliff-focused test sets to explicitly expose such local inaccuracies, and CliffLoss, a model-agnostic training mechanism that mitigates prediction bias through adaptive loss reweighting. Experimental results demonstrate that CliffSplit reveals at least 15% higher errors in cliff-dense regions of the QM9 dataset compared to standard benchmarks. Furthermore, on the Lipophilicity task, CliffLoss reduces the performance gap between cliff and non-cliff regions by up to 30% and achieves a 9.7% reduction in overall mean absolute error.
๐ Abstract
Accurate prediction of molecular properties underpins drug discovery and material design, yet even state-of-the-art models remain vulnerable to localized failure modes that aggregate metrics cannot detect. The places where molecular similarity should be most helpful are also places where standard evaluation can be most misleading. Property cliffs expose this gap: structurally similar molecules can still differ sharply in target property, so models with competitive overall performance may fail in high-risk local neighborhoods. To expose and mitigate this failure mode, CliffSplit, a cliff-aware evaluation protocol that constructs locally supported, cliff-exposed test cases, and CliffLoss, a model-agnostic train-only mitigation mechanism for cliff-sensitive errors, are introduced. Experiments on three QM9 targets and three MoleculeNet tasks across five backbones show that CliffSplit reveals at least 15% higher error in cliff-heavy QM9 regions, while CliffLoss reduces the cliff-to-smooth error gap by up to 30% on Lipophilicity and improves overall MAE by 9.7%. Together, these results turn molecular similarity failure from a descriptive anomaly into a benchmarked evaluation problem for molecular machine learning. The code is available at https://anonymous.4open.science/r/Cliff_Loss.