SHARE: Social-Humanities AI for Research and Education

📅 2026-04-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the absence of domain-specific pretrained language models for the social sciences and humanities (SSH), where general-purpose models often fail to capture disciplinary nuances and scholarly conventions. To bridge this gap, the authors introduce SHARE—a family of causal language models pretrained from scratch specifically for SSH—and accompany it with MIRROR, a non-generative interactive interface that enables users to critically review and engage with existing text without producing new content. The study also constructs a tailored Cloze evaluation benchmark for SSH, on which SHARE achieves performance comparable to Phi-4—a general model trained on over 100 times more tokens—demonstrating its efficiency and strong domain alignment. This approach establishes a novel generative AI interaction paradigm that adheres to SSH scholarly principles.

Technology Category

Application Category

📝 Abstract
This intermediate technical report introduces the SHARE family of base models and the MIRROR user interface. The SHARE models are the first causal language models fully pretrained by and for the social sciences and humanities (SSH). Their performance in modelling SSH texts is close to that of general purpose models (Phi-4) which use 100 times more tokens, as shown by our custom SSH Cloze benchmark. The MIRROR user interface is designed for reviewing text inputs from the SSH disciplines while preserving critical engagement. By prototyping a generative AI interface that does not generate any text, we propose a way to harness the capabilities of the SHARE models without compromising the integrity of SSH principles and norms.
Problem

Research questions and friction points this paper is trying to address.

social sciences and humanities
generative AI
critical engagement
AI interface
academic integrity
Innovation

Methods, ideas, or system contributions that make the work stand out.

causal language models
social sciences and humanities
domain-specific pretraining
non-generative interface
critical engagement
🔎 Similar Papers
No similar papers found.