🤖 AI Summary
This work proposes a fair and reproducible evaluation framework for efficiently screening lightweight language models in the joint task of emotion classification and VAD (Valence–Arousal–Dominance) prediction. The approach employs a unified data and inference protocol, integrating KV-off decoding, lexicon-based weak supervision, Valence Flip data augmentation, and an entropy-aware temperature-scheduled A/B hybrid sampling strategy. To enhance sensitivity and consistency in modeling emotional polarity, it further incorporates VAD semantic constraints and an external sentiment classifier as orthogonal regularization terms. Evaluated on Qwen-1.8B-Chat, the method achieves strong performance on GoEmotions and EmpatheticDialogues and demonstrates robust cross-corpus generalization on DailyDialog, offering an effective pathway toward low-cost, auditable, and re-entrant model selection.
📝 Abstract
We introduce EmoLoom-2B, a lightweight and reproducible pipeline that turns small language models under 2B parameters into fast screening candidates for joint emotion classification and Valence-Arousal-Dominance prediction. To ensure protocol-faithful and fair evaluation, we unify data loading, training, and inference under a single JSON input-output contract and remove avoidable variance by adopting KV-off decoding as the default setting. We incorporate two orthogonal semantic regularizers: a VAD-preserving constraint that aligns generated text with target VAD triples, and a lightweight external appraisal classifier that provides training-time guidance on goal attainment, controllability, certainty, and fairness without injecting long rationales. To improve polarity sensitivity, we introduce Valence Flip augmentation based on mirrored emotional pairs. During supervised fine-tuning, we apply A/B mixture sampling with entropy-aware temperature scheduling to balance coverage and convergence. Using Qwen-1.8B-Chat as the base model, EmoLoom-2B achieves strong performance on GoEmotions and EmpatheticDialogues, and demonstrates robust cross-corpus generalization on DailyDialog. The proposed recipe is budget-aware, auditable, and re-entrant, serving as a dependable screening pass before heavier training or multimodal fusion.