Fantastic Features and Where to Find Them: A Probing Method to combine Features from Multiple Foundation Models

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multi-foundation-model feature fusion methods rely heavily on downstream fine-tuning or labor-intensive hyperparameter optimization. Method: We propose ComBo—a probe-style adapter that freezes backbone parameters and requires no backpropagation. ComBo employs a lightweight Transformer to integrate token-level compressed features from multiple models and hierarchical levels, introducing a novel multi-backbone joint probing mechanism for cross-model feature fusion and task-relevance self-adaptation. It is dataset-agnostic—requiring no task-specific hyperparameters—and preserves all foundation model parameters unchanged. Contribution/Results: On all 19 VTAB-1k tasks, ComBo significantly outperforms existing probing baselines, matches or surpasses costly distillation-based model merging approaches, and remains compatible with efficient downstream probing of fine-tuned models. ComBo establishes a new paradigm for general-purpose, efficient, plug-and-play multi-model feature composition.

Technology Category

Application Category

📝 Abstract
Foundation models (FMs) trained with different objectives and data learn diverse representations, making some more effective than others for specific downstream tasks. Existing adaptation strategies, such as parameter-efficient fine-tuning, focus on individual models and do not exploit the complementary strengths across models. Probing methods offer a promising alternative by extracting information from frozen models, but current techniques do not scale well with large feature sets and often rely on dataset-specific hyperparameter tuning. We propose Combined backBones (ComBo), a simple and scalable probing-based adapter that effectively integrates features from multiple models and layers. ComBo compresses activations from layers of one or more FMs into compact token-wise representations and processes them with a lightweight transformer for task-specific prediction. Crucially, ComBo does not require dataset-specific tuning or backpropagation through the backbone models. However, not all models are equally relevant for all tasks. To address this, we introduce a mechanism that leverages ComBo's joint multi-backbone probing to efficiently evaluate each backbone's task-relevance, enabling both practical model comparison and improved performance through selective adaptation. On the 19 tasks of the VTAB-1k benchmark, ComBo outperforms previous probing methods, matches or surpasses more expensive alternatives, such as distillation-based model merging, and enables efficient probing of tuned models. Our results demonstrate that ComBo offers a practical and general-purpose framework for combining diverse representations from multiple FMs.
Problem

Research questions and friction points this paper is trying to address.

Combines features from multiple foundation models for tasks.
Efficiently selects relevant models without dataset-specific tuning.
Improves performance over existing probing and merging methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines features from multiple foundation models via probing
Uses lightweight transformer for task-specific prediction without backpropagation
Evaluates model relevance for selective adaptation to improve performance
🔎 Similar Papers
No similar papers found.