Culturally transmitted color categories in LLMs reflect a learning bias toward efficient compression

📅 2025-09-09

📈 Citations: 0

✨ Influential: 0

career value

252K/year

🤖 AI Summary

This study investigates whether large language models (LLMs) can spontaneously evolve human-like, efficient semantic systems—exemplified by color naming—through cultural transmission. Adopting an iterative in-context learning framework to simulate cross-generational cultural evolution, we conduct the first empirical test within LLMs of the Information Bottleneck (IB) theory’s prediction: semantic systems should converge toward optimal compression under a complexity–accuracy trade-off. Experiments on English color naming using Gemini 2.0-flash and Llama 3.3-70B-Instruct demonstrate that LLMs reconstruct high-IB-efficiency category structures from random initial naming systems, with distributions closely matching cross-linguistic empirical data. The key contribution is identifying an intrinsic inductive bias in LLMs toward IB-optimal solutions, providing novel evidence for convergent semantic evolution between LLMs and human cognition.

Technology Category

Application Category

📝 Abstract

Converging evidence suggests that systems of semantic categories across human languages achieve near-optimal compression via the Information Bottleneck (IB) complexity-accuracy principle. Large language models (LLMs) are not trained for this objective, which raises the question: are LLMs capable of evolving efficient human-like semantic systems? To address this question, we focus on the domain of color as a key testbed of cognitive theories of categorization and replicate with LLMs (Gemini 2.0-flash and Llama 3.3-70B-Instruct) two influential human behavioral studies. First, we conduct an English color-naming study, showing that Gemini aligns well with the naming patterns of native English speakers and achieves a significantly high IB-efficiency score, while Llama exhibits an efficient but lower complexity system compared to English. Second, to test whether LLMs simply mimic patterns in their training data or actually exhibit a human-like inductive bias toward IB-efficiency, we simulate cultural evolution of pseudo color-naming systems in LLMs via iterated in-context language learning. We find that akin to humans, LLMs iteratively restructure initially random systems towards greater IB-efficiency and increased alignment with patterns observed across the world's languages. These findings demonstrate that LLMs are capable of evolving perceptually grounded, human-like semantic systems, driven by the same fundamental principle that governs semantic efficiency across human languages.

Problem

Research questions and friction points this paper is trying to address.

Investigating if LLMs develop efficient human-like color categories

Testing whether LLMs mimic training data or exhibit learning bias

Simulating cultural evolution of color naming systems in LLMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulated cultural evolution via iterated learning

Evaluated using Information Bottleneck efficiency principle

Tested with Gemini and Llama language models

🔎 Similar Papers

Machines Do See Color: A Guideline to Classify Different Forms of Racist Discourse in Large Corpora