🤖 AI Summary
Compositional zero-shot learning (CZSL) faces two key challenges: poor generalization to unseen state-object compositions and neglect of semantic dependencies between states and objects.
Method: This paper proposes a “primitive relation” modeling framework that explicitly captures fine-grained, probabilistic semantic associations between states and objects—departing from conventional independent prediction paradigms. It employs cross-attention mechanisms to enable relation-aware feature interaction and integrates probabilistic compositional reasoning within an end-to-end differentiable training pipeline.
Contribution/Results: Our approach achieves state-of-the-art performance on all three standard CZSL benchmarks under both closed-world and open-world evaluation protocols. Quantitative gains are substantial and consistent across datasets. Moreover, visualization analyses confirm the model’s interpretability—demonstrating relation-driven decision-making—and its robust generalization to novel compositions, validating both explanatory power and compositional fidelity.
📝 Abstract
Compositional Zero-Shot Learning (CZSL) aims to identify unseen state-object compositions by leveraging knowledge learned from seen compositions. Existing approaches often independently predict states and objects, overlooking their relationships. In this paper, we propose a novel framework, learning primitive relations (LPR), designed to probabilistically capture the relationships between states and objects. By employing the cross-attention mechanism, LPR considers the dependencies between states and objects, enabling the model to infer the likelihood of unseen compositions. Experimental results demonstrate that LPR outperforms state-of-the-art methods on all three CZSL benchmark datasets in both closed-world and open-world settings. Through qualitative analysis, we show that LPR leverages state-object relationships for unseen composition prediction.