AssoCiAm: A Benchmark for Evaluating Association Thinking while Circumventing Ambiguity

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing multimodal large language model (MLLM) association reasoning evaluation frameworks overlook the inherent ambiguity in association tasks—arising from the diversity of human associative responses—leading to unreliable assessments. Method: We propose AssoCiAm, the first benchmark to decouple ambiguity into internal (model-generated response ambiguity) and external (annotator subjectivity), and introduce a human-model collaborative hybrid disambiguation mechanism. We construct a high-quality multimodal association dataset and employ a controlled-variable experimental framework to systematically evaluate mainstream MLLMs. Results: Experiments demonstrate that ambiguity significantly induces model behavior approximating randomness; AssoCiAm effectively mitigates ambiguity interference, substantially improving assessment consistency (+23.6% intraclass correlation coefficient) and reproducibility. Crucially, it establishes—for the first time—robust positive correlations between cognitive capabilities and association performance, providing the first scalable, verifiable benchmark for multimodal creative reasoning evaluation.

Technology Category

Application Category

📝 Abstract
Recent advancements in multimodal large language models (MLLMs) have garnered significant attention, offering a promising pathway toward artificial general intelligence (AGI). Among the essential capabilities required for AGI, creativity has emerged as a critical trait for MLLMs, with association serving as its foundation. Association reflects a model' s ability to think creatively, making it vital to evaluate and understand. While several frameworks have been proposed to assess associative ability, they often overlook the inherent ambiguity in association tasks, which arises from the divergent nature of associations and undermines the reliability of evaluations. To address this issue, we decompose ambiguity into two types-internal ambiguity and external ambiguity-and introduce AssoCiAm, a benchmark designed to evaluate associative ability while circumventing the ambiguity through a hybrid computational method. We then conduct extensive experiments on MLLMs, revealing a strong positive correlation between cognition and association. Additionally, we observe that the presence of ambiguity in the evaluation process causes MLLMs' behavior to become more random-like. Finally, we validate the effectiveness of our method in ensuring more accurate and reliable evaluations. See Project Page for the data and codes.
Problem

Research questions and friction points this paper is trying to address.

Evaluating association thinking in MLLMs
Addressing ambiguity in association tasks
Ensuring reliable creativity assessment benchmarks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid computational method circumvents ambiguity
Decomposes ambiguity into internal and external types
Benchmark ensures accurate reliable association evaluations
🔎 Similar Papers
No similar papers found.