🤖 AI Summary
This work addresses the challenges of anomaly detection and relational modeling in compositional visual reasoning (CVR), where complex rules and scarce samples hinder effective learning. To tackle these issues, the authors propose a novel prediction-verification framework that predicts the features of a fourth image from three input images and incorporates a Prediction Anomaly Reasoning Block (PARB) for iterative inference. The approach further integrates enhanced anomaly-aware contrastive learning to extract discriminative features. Notably, this is the first study to combine prediction-verification mechanisms with contrastive learning for CVR tasks, significantly improving both model generalization and interpretability. Extensive experiments on the SVRT, CVR, and MC²R datasets demonstrate substantial performance gains over current state-of-the-art methods, underscoring the framework’s effectiveness in complex visual reasoning scenarios.
📝 Abstract
While visual reasoning for simple analogies has received significant attention, compositional visual relations (CVR) remain relatively unexplored due to their greater complexity. To solve CVR tasks, we propose Predictive Reasoning with Augmented Anomaly Contrastive Learning (PR-A$^2$CL), \ie, to identify an outlier image given three other images that follow the same compositional rules. To address the challenge of modelling abundant compositional rules, an Augmented Anomaly Contrastive Learning is designed to distil discriminative and generalizable features by maximizing similarity among normal instances while minimizing similarity between normal and anomalous outliers. More importantly, a predict-and-verify paradigm is introduced for rule-based reasoning, in which a series of Predictive Anomaly Reasoning Blocks (PARBs) iteratively leverage features from three out of the four images to predict those of the remaining one. Throughout the subsequent verification stage, the PARBs progressively pinpoint the specific discrepancies attributable to the underlying rules. Experimental results on SVRT, CVR and MC$^2$R datasets show that PR-A$^2$CL significantly outperforms state-of-the-art reasoning models.