🤖 AI Summary
Existing contrastive learning in text generation operates solely at the instance level, neglecting fine-grained semantic discrepancies among tokens and keywords—leading to coarse-grained semantic modeling and constrained mapping relationships. To address this, we propose a hierarchical contrastive learning framework that unifies token-level, keyword-level, and instance-level semantic representations. Our approach introduces (i) a novel keyword graph construction and iterative optimization mechanism; (ii) an intra-level and inter-level collaborative contrastive paradigm to mitigate contrastive collapse; and (iii) a keyword graph neural network, multi-granularity contrastive loss, and sentence-level distribution modeling. Extensive experiments on paraphrase generation, dialogue response generation, and story generation demonstrate substantial improvements over state-of-the-art baselines. Results validate that cross-granularity semantic alignment significantly enhances generation quality.
📝 Abstract
Contrastive learning has achieved impressive success in generation tasks to militate the “exposure bias” problem and discriminatively exploit the different quality of references. Existing works mostly focus on contrastive learning on the instance-level without discriminating the contribution of each word, while keywords are the gist of the text and dominant the constrained mapping relationships. Hence, in this work, we propose a hierarchical contrastive learning mechanism, which can unify hybrid granularities semantic meaning in the input text. Concretely, we first propose a keyword graph via contrastive correlations of positive-negative pairs to iteratively polish the keyword representations. Then, we construct intra-contrasts within instance-level and keyword-level, where we assume words are sampled nodes from a sentence distribution. Finally, to bridge the gap between independent contrast levels and tackle the common contrast vanishing problem, we propose an inter-contrast mechanism that measures the discrepancy between contrastive keyword nodes respectively to the instance distribution. Experiments demonstrate that our model outperforms competitive baselines on paraphrasing, dialogue generation, and storytelling tasks.