🤖 AI Summary
To address the instability and poor generalization of large language models (e.g., GPT-3/4) in low-resource text summarization, this paper proposes two novel methods: MixSumm and PPSL. MixSumm leverages LLaMA-3-70B-Instruct to synthesize cross-topic mixed documents, enhancing data diversity; PPSL employs a prompt-driven semi-supervised pseudo-labeling strategy to generate high-quality pseudo-labels. Together, they achieve full-supervision-level performance using only 5% labeled data. This work introduces, for the first time, a hybrid-topic synthesis mechanism and a prompt-guided pseudo-labeling framework. Experiments on TweetSumm, WikiHow, and ArXiv/PubMed show ROUGE scores competitive with fully supervised baselines—significantly outperforming zero-shot and few-shot direct prompting approaches. Furthermore, comprehensive evaluation under the L-Eval unified benchmark confirms strong generalization and robustness across diverse domains and tasks.
📝 Abstract
Existing approaches for low-resource text summarization primarily employ large language models (LLMs) like GPT-3 or GPT-4 at inference time to generate summaries directly; however, such approaches often suffer from inconsistent LLM outputs and are difficult to adapt to domain-specific data in low-resource scenarios. In this work, we propose two novel methods to effectively utilize LLMs for low-resource text summarization: 1) MixSumm, an LLM-based data augmentation regime that synthesizes high-quality documents (short and long) for few-shot text summarization, and 2) PPSL, a prompt-based pseudolabeling strategy for sample-efficient semi-supervised text summarization. Specifically, MixSumm leverages the open-source LLaMA-3-70b-Instruct model to generate new documents by mixing topical information derived from a small seed set, and PPSL leverages the LLaMA-3-70b-Instruct model to generate high-quality pseudo-labels in a semi-supervised learning setup. We evaluate our methods on the TweetSumm, WikiHow, and ArXiv/PubMed datasets and use L-Eval, a LLaMA-3-based evaluation metric, and ROUGE scores to measure the quality of generated summaries. Our experiments on extractive and abstractive summarization show that MixSumm and PPSL achieve competitive ROUGE scores as a fully supervised method with 5% of the labeled data.