FlowComposer: Composable Flows for Compositional Zero-Shot Learning

📅 2026-03-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the generalization bottleneck in compositional zero-shot learning (CZSL) caused by implicit composition construction and feature entanglement. To this end, we propose FlowComposer, a novel framework that introduces flow matching to CZSL for the first time. FlowComposer explicitly models the composable flows of attributes and objects through a learnable Composer module and aligns visual features with textual embeddings by fusing their velocity fields. Furthermore, we design a leakage-guided augmentation mechanism that effectively leverages residual entangled features to enhance model robustness. As a plug-and-play component, FlowComposer significantly boosts the performance of diverse baseline models across three standard CZSL benchmarks, demonstrating its effectiveness and broad applicability.

Technology Category

Application Category

📝 Abstract
Compositional zero-shot learning (CZSL) aims to recognize unseen attribute-object compositions by recombining primitives learned from seen pairs. Recent CZSL methods built on vision-language models (VLMs) typically adopt parameter-efficient fine-tuning (PEFT). They apply visual disentanglers for decomposition and manipulate token-level prompts or prefixes to encode compositions. However, such PEFT-based designs suffer from two fundamental limitations: (1) Implicit Composition Construction, where composition is realized only via token concatenation or branch-wise prompt tuning rather than an explicit operation in the embedding space; (2) Remained Feature Entanglement, where imperfect disentanglement leaves attribute, object, and composition features mutually contaminated. Together, these issues limit the generalization ability of current CZSL models. In this paper, we are the first to systematically study flow matching for CZSL and introduce FlowComposer, a model-agnostic framework that learns two primitive flows to transport visual features toward attribute and object text embeddings, and a learnable Composer that explicitly fuses their velocity fields into a composition flow. To exploit the inevitable residual entanglement, we further devise a leakage-guided augmentation scheme that reuses leaked features as auxiliary signals. We thoroughly evaluate FlowComposer on three public CZSL benchmarks by integrating it as a plug-and-play component into various baselines, consistently achieving significant improvements.
Problem

Research questions and friction points this paper is trying to address.

Compositional Zero-Shot Learning
Feature Entanglement
Implicit Composition
Vision-Language Models
Generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching
compositional zero-shot learning
feature disentanglement
explicit composition
leakage-guided augmentation
🔎 Similar Papers
No similar papers found.
Zhenqi He
Zhenqi He
The Hong Kong University of Science and Technology (HKUST) | The University of Hong Kong (HKU)
Open-World LearningComputer VisionMulti-Modal Learning
L
Lin Li
The Hong Kong University of Science and Technology
L
Long Chen
The Hong Kong University of Science and Technology