SalientFusion: Context-Aware Compositional Zero-Shot Food Recognition

📅 2025-09-04
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of recognizing unseen dishes in zero-shot food recognition (ZSFR), this paper introduces a novel task—Compositional Zero-Shot Food Recognition (CZSFR)—which models cuisines and ingredients as attributes and objects, respectively, within the compositional zero-shot learning (CZSL) framework. To tackle three key challenges—background redundancy, confusion between main and side dishes, and attribute semantic bias—we propose the first decoupled representation of food components and cooking attributes. We design SalientFormer, a saliency-guided Transformer for role-aware feature learning, and DebiasAT, a debiasing prompt alignment module for context-aware vision-language alignment. Evaluated on our newly constructed benchmarks CZSFood-90/164 and standard CZSL datasets, our method achieves significant improvements in unseen-class recognition accuracy and model robustness, setting new state-of-the-art performance.

Technology Category

Application Category

📝 Abstract
Food recognition has gained significant attention, but the rapid emergence of new dishes requires methods for recognizing unseen food categories, motivating Zero-Shot Food Learning (ZSFL). We propose the task of Compositional Zero-Shot Food Recognition (CZSFR), where cuisines and ingredients naturally align with attributes and objects in Compositional Zero-Shot learning (CZSL). However, CZSFR faces three challenges: (1) Redundant background information distracts models from learning meaningful food features, (2) Role confusion between staple and side dishes leads to misclassification, and (3) Semantic bias in a single attribute can lead to confusion of understanding. Therefore, we propose SalientFusion, a context-aware CZSFR method with two components: SalientFormer, which removes background redundancy and uses depth features to resolve role confusion; DebiasAT, which reduces the semantic bias by aligning prompts with visual features. Using our proposed benchmarks, CZSFood-90 and CZSFood-164, we show that SalientFusion achieves state-of-the-art results on these benchmarks and the most popular general datasets for the general CZSL. The code is avaliable at https://github.com/Jiajun-RUC/SalientFusion.
Problem

Research questions and friction points this paper is trying to address.

Recognizing unseen food categories with zero-shot learning
Addressing background distraction and role confusion in food images
Reducing semantic bias in compositional food attribute recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

SalientFormer removes background and uses depth features
DebiasAT aligns prompts to reduce semantic bias
Context-aware method for compositional zero-shot food recognition
🔎 Similar Papers
No similar papers found.