Is a Peeled Apple Still Red? Evaluating LLMs' Ability for Conceptual Combination with Property Type

📅 2025-02-10

📈 Citations: 0

✨ Influential: 0

career value

155K/year

🤖 AI Summary

This study investigates large language models’ (LLMs) capacity to model dynamic attribute behaviors—inheritance, emergence, and cancellation—in conceptual composition. To this end, we introduce CCPT, the first fine-grained annotated dataset (12.3K triplets) for this task, propose a cognitive-psychology-inspired generative methodology, and design a multitask evaluation framework integrating attribute-type-aware prompting and decoding strategies, alongside a high-consistency automated metric. Our contributions are threefold: (1) We systematically distinguish and model attribute types and their compositional dynamics for the first time; (2) We achieve substantial improvements in emergent attribute generation (+28.6% over baselines on average); and (3) We empirically demonstrate that state-of-the-art LLMs—including o1—exhibit significant limitations in attribute emergence. All code, data, and evaluation tools are publicly released to support reproducible research on conceptual composition.

Technology Category

Application Category

📝 Abstract

Conceptual combination is a cognitive process that merges basic concepts, enabling the creation of complex expressions. During this process, the properties of combination (e.g., the whiteness of a peeled apple) can be inherited from basic concepts, newly emerge, or be canceled. However, previous studies have evaluated a limited set of properties and have not examined the generative process. To address this gap, we introduce the Conceptual Combination with Property Type dataset (CCPT), which consists of 12.3K annotated triplets of noun phrases, properties, and property types. Using CCPT, we establish three types of tasks to evaluate LLMs for conceptual combination thoroughly. Our key findings are threefold: (1) Our automatic metric grading property emergence and cancellation closely corresponds with human judgments. (2) LLMs, including OpenAI's o1, struggle to generate noun phrases which possess given emergent properties. (3) Our proposed method, inspired by cognitive psychology model that explains how relationships between concepts are formed, improves performances in all generative tasks. The dataset and experimental code are available at https://github.com/seokwon99/CCPT.git.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLMs' conceptual combination ability

Assessing property emergence and cancellation

Improving generative tasks with cognitive psychology model

Innovation

Methods, ideas, or system contributions that make the work stand out.

Creates CCPT dataset for evaluation

Tests LLMs on property types

Applies cognitive psychology model

🔎 Similar Papers

No similar papers found.