🤖 AI Summary
To address the performance degradation and high computational overhead of large language models (LLMs) in human value identification for long texts, this paper proposes a two-stage collaborative framework combining local model guidance with online LLM calibration. Methodologically, it introduces (1) a fine-tunable lightweight value detector trained via explanation-driven learning and value-semantic-guided active sampling to generate high-information-density prompts; and (2) a prompt refinement and compression mechanism that substantially reduces input token count. Experimental results demonstrate that the approach cuts token consumption to one-sixth of direct LLM invocation, while consistently outperforming both BERT-based baselines and end-to-end LLM approaches in accuracy. It achieves state-of-the-art performance across multiple value identification benchmarks, validating its efficacy in balancing efficiency and fidelity for fine-grained value detection in lengthy textual inputs.
📝 Abstract
The rapid evolution of large language models (LLMs) has revolutionized various fields, including the identification and discovery of human values within text data. While traditional NLP models, such as BERT, have been employed for this task, their ability to represent textual data is significantly outperformed by emerging LLMs like GPTs. However, the performance of online LLMs often degrades when handling long contexts required for value identification, which also incurs substantial computational costs. To address these challenges, we propose EAVIT, an efficient and accurate framework for human value identification that combines the strengths of both locally fine-tunable and online black-box LLMs. Our framework employs a value detector - a small, local language model - to generate initial value estimations. These estimations are then used to construct concise input prompts for online LLMs, enabling accurate final value identification. To train the value detector, we introduce explanation-based training and data generation techniques specifically tailored for value identification, alongside sampling strategies to optimize the brevity of LLM input prompts. Our approach effectively reduces the number of input tokens by up to 1/6 compared to directly querying online LLMs, while consistently outperforming traditional NLP methods and other LLM-based strategies.