๐ค AI Summary
Traditional semantic feature norming research faces a trade-off between conceptual coverage breadth and annotation quality due to prohibitive human labor costs.
Method: We propose an LLM-augmented paradigm to construct NOVAโa high-density semantic feature dataset covering 786 conceptsโby integrating human-elicited features with LLM-generated ones, followed by rigorous expert validation and behavioral experiments.
Contribution/Results: This work achieves the first credible, human-verified integration of LLM-generated features with canonical human norms. We demonstrate that human conceptual knowledge substantially exceeds existing norming datasets, yielding significantly higher feature density and inter-concept overlap. In predicting human semantic similarity judgments, NOVA consistently outperforms both pure human norming datasets and state-of-the-art word embedding models (e.g., BERT, GloVe). Our approach establishes a novel AI-augmented paradigm for cognitive science data construction, balancing scalability, fidelity, and empirical validity.
๐ Abstract
Semantic feature norms have been foundational in the study of human conceptual knowledge, yet traditional methods face trade-offs between concept/feature coverage and verifiability of quality due to the labor-intensive nature of norming studies. Here, we introduce a novel approach that augments a dataset of human-generated feature norms with responses from large language models (LLMs) while verifying the quality of norms against reliable human judgments. We find that our AI-enhanced feature norm dataset, NOVA: Norms Optimized Via AI, shows much higher feature density and overlap among concepts while outperforming a comparable human-only norm dataset and word-embedding models in predicting people's semantic similarity judgments. Taken together, we demonstrate that human conceptual knowledge is richer than captured in previous norm datasets and show that, with proper validation, LLMs can serve as powerful tools for cognitive science research.