🤖 AI Summary
Traditional diversity methods in text-to-image retrieval neglect application-specific contextual information and struggle to accommodate multi-attribute requirements. To address this, we propose a novel task—Context-aware Diversity Optimization for Composite Attributes (CDR-CA)—the first to incorporate context awareness into multi-attribute diversity modeling. Methodologically, we formulate a unified manifold-space multi-source Determinantal Point Process (DPP) model and introduce a tangent normalization mechanism to dynamically encode contextual signals, enabling controllable and adaptive diversity regulation. Experiments demonstrate significant improvements in both relevance and practical utility of retrieved results across multiple diversity metrics, effectively balancing diversity and accuracy. The implementation is publicly available.
📝 Abstract
Result diversification (RD) is a crucial technique in Text-to-Image Retrieval for enhancing the efficiency of a practical application. Conventional methods focus solely on increasing the diversity metric of image appearances. However, the diversity metric and its desired value vary depending on the application, which limits the applications of RD. This paper proposes a novel task called CDR-CA (Contextual Diversity Refinement of Composite Attributes). CDR-CA aims to refine the diversities of multiple attributes, according to the application's context. To address this task, we propose Multi-Source DPPs, a simple yet strong baseline that extends the Determinantal Point Process (DPP) to multi-sources. We model MS-DPP as a single DPP model with a unified similarity matrix based on a manifold representation. We also introduce Tangent Normalization to reflect contexts. Extensive experiments demonstrate the effectiveness of the proposed method. Our code is publicly available at https://github.com/NEC-N-SOGI/msdpp.