BenCSSmark: Making the Social Sciences Count in LLM Research

📅 2026-05-06
📈 Citations: 0
Influential: 0
📄 PDF

career value

192K/year
📝 Abstract
This position paper argues that the under-representation of social science tasks in contemporary LLM benchmarks limits advances in both LLM evaluation and social scientific inquiry. Benchmarks -- standardized tools for assessing computational systems -- are pivotal in the development of artificial intelligence (AI), including large language models (LLMs). Benchmarks do more than measure progress -- they actively structure it, shaping reputations, research agendas, and commercial outcomes. Despite this central role, the social sciences are largely absent from mainstream evaluation frameworks, even though scholars in these fields generate dozens of rigorously annotated, context-sensitive datasets each year. Integrating this work into benchmark design could significantly improve the generalization and robustness of AI models. In turn, models trained on social scientific tasks would likely yield better performance on classic and contemporary tasks in disciplines as diverse as history, sociology, political science or economics. This is all the more pressing as these disciplines are quickly turning to LLMs for assistance. To address this gap, we introduce BenCSSmark, a benchmark composed of datasets annotated by computational social scientists. By integrating social scientific perspectives into benchmarking, BenCSSmark seeks to promote more robust, transparent, and socially relevant AI systems and to foster efficient collaboration.
Problem

Research questions and friction points this paper is trying to address.

social sciences
large language models
benchmarks
evaluation
AI systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

BenCSSmark
social science benchmarks
large language models
context-sensitive datasets
AI evaluation
🔎 Similar Papers
No similar papers found.
A
Arnault Chatelain
CREST (École Polytechnique, ENSAE, CNRS), 5 avenue Le Chatelier, 91120 Palaiseau, France
É
Étienne Ollion
CREST (École Polytechnique, ENSAE, CNRS), 5 avenue Le Chatelier, 91120 Palaiseau, France
Q
Qianwen Guan
LLF (Université Paris Cité and CNRS), UFRL Olympe de Gouges, 13 place Paul Ricoeur, 75013 Paris, France
D
Diandra Fabre
Univ. Grenoble Alpes, CNRS, Grenoble INP, LIG, 38000 Grenoble, France
Lorraine Goeuriot
Lorraine Goeuriot
Université Grenoble Alpes
E
Emile Chapuis
INA (Institut National de l’Audiovisuel), 4 Avenue de l’Europe, 94366 Bry-sur-Marne, France
A
Abdelkrim Beloued
INA (Institut National de l’Audiovisuel), 4 Avenue de l’Europe, 94366 Bry-sur-Marne, France
Marie Candito
Marie Candito
Maîtresse de Conférences, Université Paris Cité
natural language processingsyntactic parsingsemantic parsingsyntactico-semantic resources
N
Nicolas Hervé
INA (Institut National de l’Audiovisuel), 4 Avenue de l’Europe, 94366 Bry-sur-Marne, France
Didier Schwab
Didier Schwab
Univ. Grenoble Alpes, LIG-GETALP
Natural Language ProcessingLarge Language ModelsAlternative and Augmentative Communication