π€ AI Summary
This study investigates whether fine-tuning large language models on distinct religious texts induces systematic differences in their ethical reasoning patterns. Building upon Llama-3.1-8B, the authors construct a baseline model and five LoRA-fine-tuned variants trained respectively on canonical texts from Christianity, Islam, Judaism, Hinduism, and Buddhism. Model responses are evaluated across 17 standardized ethical dilemmas using multi-temperature sampling to assess consistency and stability. The work introduces an innovative βcondensate comparison method,β repurposing differentially fine-tuned models as instruments for cultural-ethical analysis and establishing falsifiable criteria. Results reveal that each LoRA model exhibits ethical inclinations aligned with its respective religious tradition, achieving 100% consensus on high-agreement dilemmas such as the trolley problem. Disagreement intensifies with higher temperatures on contentious issues, while the baseline model demonstrates the highest overall consistency (mean 88.3%).
π Abstract
We present Six Llamas, a comparative study examining whether large language models fine-tuned on distinct religious corpora encode systematically different patterns of ethical reasoning. Six variants of Meta-Llama-3.1-8B are constructed: one unmodified control and five LoRA-adapted models trained exclusively on the sacred and theological texts of Christianity, Islam, Judaism, Hinduism, or Buddhism. All six models are probed with an identical battery of 17 standardized ethical prompts spanning moral dilemmas, game-theoretic scenarios, public policy questions, and moral-psychological self-assessments. To assess robustness and reproducibility, we implement a multi-temperature sampling design spanning ten temperature settings. We compute response consistency metrics, pairwise inter-model agreement rates, temperature sensitivity coefficients across four prompt domains, and run-to-run stability analyses.
Findings show that LoRA-adapted models produce ethical reasoning patterns that are (a) systematically differentiated from the base model, (b) consistent with the moral logics of their training traditions, (c) structured along interpretable dimensions in moral-philosophical space, (d) core ethical positions remain stable across temperature variations for high-consensus dilemmas. The Trolley Problem achieves 100% consistency across all models and temperatures, while (e) tradition-specific divergence intensifies at higher temperatures in morally contested domains, and (f) the base model exhibits the highest overall response consistency (mean 88.3%), suggesting LoRA adaptation introduces both tradition-specific signal and increased sampling sensitivity.
The study offers a proof-of-concept for the condensate comparative method using differentially trained language models as instruments for cultural and ethical analysis and identifies specific criteria for falsification and planned extensions.