HACHIMI: Scalable and Controllable Student Persona Generation via Orchestrated Agents

📅 2026-03-05

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study addresses the limitations of existing student profiling methods, which lack alignment with educational theory and controllable population distribution, thereby hindering systematic evaluation of educational large language models. The authors propose the Theory-Aligned and Distribution-Controlled Profile Generation (TAD-PG) task and introduce a multi-agent Propose-Validate-Revise framework that integrates theory-anchored educational schemata, a neuro-symbolic validator, hierarchical sampling, and semantic deduplication mechanisms. This approach enables, for the first time, formalized generation of student profiles that are both theoretically grounded and distributionally controllable. Leveraging Qwen2.5-72B, the team constructs HACHIMI-1M, a corpus comprising one million synthetic student profiles spanning grades 1–12. Intrinsic evaluations demonstrate high schema validity, precise quota adherence, and strong diversity, while extrinsic assessments on CEPS and PISA 2022 datasets reveal close alignment between generated profiles and real students in dimensions such as mathematical ability and curiosity.

Technology Category

Application Category

📝 Abstract

Student Personas (SPs) are emerging as infrastructure for educational LLMs, yet prior work often relies on ad-hoc prompting or hand-crafted profiles with limited control over educational theory and population distributions. We formalize this as Theory-Aligned and Distribution-Controllable Persona Generation (TAD-PG) and introduce HACHIMI, a multi-agent Propose-Validate-Revise framework that generates theory-aligned, quota-controlled personas. HACHIMI factorizes each persona into a theory-anchored educational schema, enforces developmental and psychological constraints via a neuro-symbolic validator, and combines stratified sampling with semantic deduplication to reduce mode collapse. The resulting HACHIMI-1M corpus comprises 1 million personas for Grades 1-12. Intrinsic evaluation shows near-perfect schema validity, accurate quotas, and substantial diversity, while external evaluation instantiates personas as student agents answering CEPS and PISA 2022 surveys; across 16 cohorts, math and curiosity/growth constructs align strongly between humans and agents, whereas classroom-climate and well-being constructs are only moderately aligned, revealing a fidelity gradient. All personas are generated with Qwen2.5-72B, and HACHIMI provides a standardized synthetic student population for group-level benchmarking and social-science simulations. Resources available at https://github.com/ZeroLoss-Lab/HACHIMI

Problem

Research questions and friction points this paper is trying to address.

Student Personas

Educational LLMs

Population Distribution

Educational Theory

Synthetic Student Population

Innovation

Methods, ideas, or system contributions that make the work stand out.

Theory-Aligned Persona Generation

Multi-Agent Framework

Neuro-Symbolic Validation

Stratified Sampling

Synthetic Student Population

🔎 Similar Papers

AutoPal: Autonomous Adaptation to Users for Personal AI Companionship

2024-06-20Citations: 1