Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models

📅 2025-05-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Information extraction (IE) on resource-constrained edge devices faces critical challenges with large language models (LLMs), including severe hallucination, limited context length, and high inference latency—especially when dynamically adapting to diverse extraction schemas. This work proposes Dual-LoRA, a dual-module architecture coupled with an incremental schema caching mechanism, enabling the first decoupled design of schema identification and information extraction. It supports real-time, accurate adaptation of edge-deployed LLMs to hundreds of distinct schemas. The method integrates LoRA-based lightweight fine-tuning, two-stage efficient inference, and an edge-optimized inference engine. Experiments across multiple IE benchmarks demonstrate consistent improvements: +3.2–5.8 F1 points, 2.1× faster inference speed, and 37% reduction in memory footprint compared to baseline approaches.

Technology Category

Application Category

📝 Abstract
Information extraction (IE) plays a crucial role in natural language processing (NLP) by converting unstructured text into structured knowledge. Deploying computationally intensive large language models (LLMs) on resource-constrained devices for information extraction is challenging, particularly due to issues like hallucinations, limited context length, and high latency-especially when handling diverse extraction schemas. To address these challenges, we propose a two-stage information extraction approach adapted for on-device LLMs, called Dual-LoRA with Incremental Schema Caching (DLISC), which enhances both schema identification and schema-aware extraction in terms of effectiveness and efficiency. In particular, DLISC adopts an Identification LoRA module for retrieving the most relevant schemas to a given query, and an Extraction LoRA module for performing information extraction based on the previously selected schemas. To accelerate extraction inference, Incremental Schema Caching is incorporated to reduce redundant computation, substantially improving efficiency. Extensive experiments across multiple information extraction datasets demonstrate notable improvements in both effectiveness and efficiency.
Problem

Research questions and friction points this paper is trying to address.

Deploying LLMs on resource-constrained devices for IE
Addressing hallucinations and high latency in schema-aware IE
Improving efficiency and effectiveness of on-device IE
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-stage IE approach for on-device LLMs
Dual-LoRA modules for schema tasks
Incremental caching reduces redundant computation
🔎 Similar Papers
No similar papers found.