🤖 AI Summary
This study investigates whether dialogue-only pretraining can yield compact language models that excel both structurally and functionally in conversational tasks. We adopt the Llamalogue architecture, pretrained exclusively on conversational corpora, and evaluate performance on a custom dialogue benchmark—rather than the standard BabyLM suite—augmented by fine-tuning via PPO and DPO. Results show that while dialogue-only pretraining fails to achieve competitive performance on general language understanding (e.g., BabyLM), it substantially improves dialogue continuation prediction. Moreover, DPO outperforms PPO in this setting, exposing limitations of conventional reinforcement learning approaches like PPO for low-resource dialogue modeling. Our core contribution is the first systematic validation of the “dialogue-first” pretraining paradigm, demonstrating its specific, measurable gains for small-scale models’ conversational competence. We further establish DPO as the preferred method for efficient alignment of lightweight dialogue models.
📝 Abstract
We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.