Reproduction Beyond Benchmarks: ConstBERT and ColBERT-v2 Across Backends and Query Distributions

📅 2026-04-11

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

This study investigates the generalization limitations of multi-vector retrieval models—such as ColBERT-v2 and ConstBERT—in non-standard scenarios, particularly long narrative queries. Through ablation studies, cross-backend deployment, diverse query distribution testing, and large-scale fine-tuning, the authors achieve a reproducibility error below 0.05% MRR@10 on MS MARCO. However, performance drops drastically by 86–97% on the TREC ToT 2025 long-query benchmark, with data augmentation even causing up to 29% further degradation. The work identifies the MaxSim operator’s uniform token weighting as a key culprit, which fails to distinguish signal from noise, thereby exposing an inherent architectural limitation in multi-vector models that cannot be overcome through fine-tuning alone.

Technology Category

Application Category

📝 Abstract

Reproducibility must validate architectural robustness, not just numerical accuracy. We evaluate ColBERT-v2 and ConstBERT across five dimensions, finding that while ConstBERT reproduces within 0.05% MRR@10 on MS-MARCO, both models show a drop of 86-97% on long, narrative queries (TREC ToT 2025). Ablations prove this failure is architectural: performance plateaus at 20 words because the MaxSim operator's uniform token weighting cannot distinguish signal from filler noise. Furthermore, undocumented backend parameters create an 8-point gap due to ConstBERT's sparse centroid coverage, and fine-tuning with 3x more data actually degrades performance by up to 29%. We conclude that architectural constraints in multi-vector retrieval cannot be overcome by adaptation alone. Code: https://github.com/utshabkg/multi-vector-reproducibility.

Problem

Research questions and friction points this paper is trying to address.

multi-vector retrieval

architectural robustness

long narrative queries

reproducibility

MaxSim operator

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-vector retrieval

architectural robustness

MaxSim operator