๐ค AI Summary
Traditional aspect-based summarization (ABSA) methods suffer from limited resource dependency and poor generalization, while large language models (LLMs) face challenges including heavy reliance on intricate prompt engineering, strict context-length constraints, and high hallucination rates. To address these issues, this paper proposes Self-Aspect Retrieval-Augmented Generation (Self-Aspect RAG), a novel framework that introduces the first aspect-driven embedding retrieval mechanism, decoupling retrieval from generation. It employs aspect-aware fine-grained text truncation and lightweight prompt optimization to enforce strict aspect fidelityโwithout parameter fine-tuning. Evaluated across multiple benchmarks, Self-Aspect RAG achieves state-of-the-art performance: +12.6% aspect relevance, +37% token utilization efficiency, and โ29.4% hallucination rate, demonstrating significant improvements in both factual consistency and aspect-specific summarization capability.
๐ Abstract
Aspect-based summarization aims to generate summaries tailored to specific aspects, addressing the resource constraints and limited generalizability of traditional summarization approaches. Recently, large language models have shown promise in this task without the need for training. However, they rely excessively on prompt engineering and face token limits and hallucination challenges, especially with in-context learning. To address these challenges, in this paper, we propose a novel framework for aspect-based summarization: Self-Aspect Retrieval Enhanced Summary Generation. Rather than relying solely on in-context learning, given an aspect, we employ an embedding-driven retrieval mechanism to identify its relevant text segments. This approach extracts the pertinent content while avoiding unnecessary details, thereby mitigating the challenge of token limits. Moreover, our framework optimizes token usage by deleting unrelated parts of the text and ensuring that the model generates output strictly based on the given aspect. With extensive experiments on benchmark datasets, we demonstrate that our framework not only achieves superior performance but also effectively mitigates the token limitation problem.