MELD: Multi-Task Equilibrated Learning Detector for AI-Generated Text

📅 2026-05-07

📈 Citations: 0

✨ Influential: 0

career value

175K/year

🤖 AI Summary

Current AI-generated text detectors exhibit limited generalization under adversarial attacks, unseen generators, and low false-positive-rate regimes. This work proposes a multitask balanced learning framework that enhances robustness and generalization while preserving the standard binary detection interface. The approach integrates auxiliary supervision signals—such as generator family, attack type, and source domain—and combines homoscedastic uncertainty weighting, EMA teacher–student adversarial distillation, and hard negative pairwise ranking loss. Evaluated on MELD-eval, the method achieves a true positive rate of 99.9% at a 1% false positive rate, substantially outperforming existing baselines. It currently ranks as the strongest open-source detector on the RAID leaderboard, with performance comparable to leading commercial systems.

📝 Abstract

Large language models are now embedded in everyday writing workflows, making reliable AI-generated text detection important for academic integrity, content moderation, and provenance tracking. In practice, however, a detector must do more than achieve high aggregate AUROC on clean, in-distribution human and AI text: it should remain robust to attacks and adversarial rewrites, transfer to unseen generators and domains, and operate at low false-positive rates (FPR). Most existing detectors optimize a single AI/Human objective, giving the representation little incentive to learn generator, attack, or domain structure once the binary task saturates. We introduce MELD (Multi-Task Equilibrated Learning Detector), a deployable detector for AI-generated text that enriches binary detection with auxiliary supervision. MELD attaches generator-family, attack-type, and source-domain heads to a shared encoder, and balances the four losses with learned homoscedastic uncertainty weights. To improve robustness, an EMA teacher predicts on clean inputs while an attack-augmented student is distilled toward the teacher. MELD further uses a hard-negative pairwise ranking loss to enlarge the score margin between AI-generated texts and the most confusable human texts. At inference, all auxiliary heads are discarded, giving MELD the same interface and cost as a standard detector. On the public RAID leaderboard, MELD is the strongest open-source detector and is competitive with leading commercial models, especially under attack and at low FPR. Across standard held-out benchmarks, MELD matches or outperforms supervised baselines. We further introduce MELD-eval, a held-out evaluation pool built from recent chat models released by four major LLM providers. Without additional finetuning, MELD achieves 99.9% TPR at 1% FPR on MELD-eval, while many baselines degrade sharply.

Problem

Research questions and friction points this paper is trying to address.

AI-generated text detection

robustness

adversarial attacks

domain transfer

false-positive rate

Innovation

Methods, ideas, or system contributions that make the work stand out.

multi-task learning

adversarial robustness

uncertainty-based loss weighting