🤖 AI Summary
This paper addresses the lack of a unified end-to-end research framework for LLM-based dialogue systems by proposing SDialog—the first open-source, dialogue-centric Python toolkit. Methodologically, it unifies dialogue generation, multi-dimensional evaluation (including LLM-as-a-judge and functional correctness verification), and mechanistic interpretability analysis (supporting neuron-level steering, feature ablation/induction, and activation visualization), while integrating persona-driven multi-agent simulation and 3D acoustic modeling for speech synthesis. Its key contributions are: (1) the first integrated architecture unifying generation, evaluation, and explanation; (2) support for hybrid experiments across diverse LLM backends with seamless integration; and (3) a standardized `Dialog` data structure and full-stack out-of-the-box functionality. Experiments demonstrate that SDialog significantly improves development efficiency, evaluation reliability, and depth of mechanistic understanding.
📝 Abstract
We present SDialog, an MIT-licensed open-source Python toolkit that unifies dialog generation, evaluation and mechanistic interpretability into a single end-to-end framework for building and analyzing LLM-based conversational agents. Built around a standardized exttt{Dialog} representation, SDialog provides: (1) persona-driven multi-agent simulation with composable orchestration for controlled, synthetic dialog generation, (2) comprehensive evaluation combining linguistic metrics, LLM-as-a-judge and functional correctness validators, (3) mechanistic interpretability tools for activation inspection and steering via feature ablation and induction, and (4) audio generation with full acoustic simulation including 3D room modeling and microphone effects. The toolkit integrates with all major LLM backends, enabling mixed-backend experiments under a unified API. By coupling generation, evaluation, and interpretability in a dialog-centric architecture, SDialog enables researchers to build, benchmark and understand conversational systems more systematically.