Feasibility with Language Models for Open-World Compositional Zero-Shot Learning

📅 2025-05-16

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

In open-world compositional zero-shot learning (OW-CZSL), zero-shot predictors suffer from semantic feasibility ambiguity—failing to distinguish semantically plausible state–object compositions from implausible or rare ones—leading to erroneous predictions. To address this, we propose the Feasibility Language Model (FLM), a lightweight framework that, for the first time, integrates large language models’ (LLMs) semantic feasibility reasoning and in-context learning capabilities into OW-CZSL without fine-tuning. FLM leverages prompt engineering (using Vicuna/ChatGPT), logit extraction, and a feasibility-weighting mechanism to augment zero-shot classifiers with composition-aware prediction. Evaluated on all three standard OW-CZSL benchmarks, FLM achieves significant performance gains, effectively mitigating recognition bias induced by infeasible compositions. Our approach establishes a scalable, training-free paradigm for open-world compositional reasoning, advancing generalization in zero-shot compositional classification.

Technology Category

Application Category

📝 Abstract

Humans can easily tell if an attribute (also called state) is realistic, i.e., feasible, for an object, e.g. fire can be hot, but it cannot be wet. In Open-World Compositional Zero-Shot Learning, when all possible state-object combinations are considered as unseen classes, zero-shot predictors tend to perform poorly. Our work focuses on using external auxiliary knowledge to determine the feasibility of state-object combinations. Our Feasibility with Language Model (FLM) is a simple and effective approach that leverages Large Language Models (LLMs) to better comprehend the semantic relationships between states and objects. FLM involves querying an LLM about the feasibility of a given pair and retrieving the output logit for the positive answer. To mitigate potential misguidance of the LLM given that many of the state-object compositions are rare or completely infeasible, we observe that the in-context learning ability of LLMs is essential. We present an extensive study identifying Vicuna and ChatGPT as best performing, and we demonstrate that our FLM consistently improves OW-CZSL performance across all three benchmarks.

Problem

Research questions and friction points this paper is trying to address.

Determining feasibility of state-object combinations in zero-shot learning

Leveraging LLMs to understand semantic state-object relationships

Improving open-world compositional zero-shot learning performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages LLMs for state-object feasibility

Uses in-context learning to mitigate misguidance

Improves OW-CZSL performance across benchmarks

🔎 Similar Papers

ZeroDL: Zero-shot Distribution Learning for Text Clustering via Large Language Models