Self Knowledge Re-expression: A Fully Local Method for Adapting LLMs to Tasks Using Intrinsic Knowledge

πŸ“… 2026-04-24
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF

career value

176K/year
πŸ€– AI Summary
This work addresses the limitations of large language models in non-generative, task-specific applications, where their sequential generation mechanism hinders efficient representation of task-relevant knowledge. The authors propose a fully local adaptation method that requires no labeled data, human supervision, or model distillation, introducing the first task-agnostic Self-Knowledge Re-expression (SKR) mechanism. SKR leverages only the model’s intrinsic knowledge and unlabeled data to achieve efficient adaptation. Evaluated on financial document tasks, the approach demonstrates substantial performance gains: Recall@1 in information retrieval improves by over 40%, inference latency in object detection decreases by more than 76%, and AUPRC in anomaly detection increases by over 33%. Furthermore, it surpasses the current state-of-the-art retrieval models by at least 12.6% on the MMDocRAG benchmark.

Technology Category

Application Category

πŸ“ Abstract
While the next-token prediction (NTP) paradigm enables large language models (LLMs) to express their intrinsic knowledge, its sequential nature constrains performance on specialized, non-generative tasks. We attribute this performance bottleneck to the LLMs' knowledge expression mechanism, rather than to deficiencies in knowledge acquisition. To address this, we propose Self-Knowledge Re-expression (SKR), a novel, task-agnostic adaptation method. SKR transforms the LLM's output from generic token generation to highly efficient, task-specific expression. SKR is a fully local method that uses only unannotated data, requiring neither human supervision nor model distillation. Experiments on a large financial document dataset demonstrate substantial improvements: over 40% in Recall@1 for information retrieval tasks, over 76% reduction in object detection latency, and over 33% increase in anomaly detection AUPRC. Our results on the MMDocRAG dataset surpass those of leading retrieval models by at least 12.6%.
Problem

Research questions and friction points this paper is trying to address.

large language models
next-token prediction
knowledge expression
non-generative tasks
task adaptation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Knowledge Re-expression
task-agnostic adaptation
fully local method
intrinsic knowledge
next-token prediction