Dialogue Is Not Enough to Make a Communicative BabyLM (But Neither Is Developmentally Inspired Reinforcement Learning)

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

This study investigates whether dialogue-only pretraining can yield compact language models that excel both structurally and functionally in conversational tasks. We adopt the Llamalogue architecture, pretrained exclusively on conversational corpora, and evaluate performance on a custom dialogue benchmark—rather than the standard BabyLM suite—augmented by fine-tuning via PPO and DPO. Results show that while dialogue-only pretraining fails to achieve competitive performance on general language understanding (e.g., BabyLM), it substantially improves dialogue continuation prediction. Moreover, DPO outperforms PPO in this setting, exposing limitations of conventional reinforcement learning approaches like PPO for low-resource dialogue modeling. Our core contribution is the first systematic validation of the “dialogue-first” pretraining paradigm, demonstrating its specific, measurable gains for small-scale models’ conversational competence. We further establish DPO as the preferred method for efficient alignment of lightweight dialogue models.

Technology Category

Application Category

📝 Abstract

We investigate whether pre-training exclusively on dialogue data results in formally and functionally apt small language models. Based on this pre-trained llamalogue model, we employ a variety of fine-tuning strategies to enforce "more communicative" text generations by our models. Although our models underperform on most standard BabyLM benchmarks, they excel at dialogue continuation prediction in a minimal pair setting. While PPO fine-tuning has mixed to adversarial effects on our models, DPO fine-tuning further improves their performance on our custom dialogue benchmark.

Problem

Research questions and friction points this paper is trying to address.

Investigating dialogue-only pre-training for small language models

Evaluating fine-tuning strategies for communicative text generation

Assessing model performance on dialogue continuation benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Pre-trained exclusively on dialogue data

Employed PPO and DPO fine-tuning strategies

Excelled at dialogue continuation prediction

🔎 Similar Papers

No similar papers found.