Building Resource-Constrained Language Agents: A Korean Case Study on Chemical Toxicity Information

📅 2025-03-22

📈 Citations: 0

✨ Influential: 0

career value

189K/year

🤖 AI Summary

Deploying large language models (LLMs) in resource-constrained environments—particularly for specialized domains (e.g., chemical toxicity) and low-resource languages (e.g., Korean)—remains challenging due to computational, data, and domain-knowledge limitations. Method: This paper proposes a practical framework for lightweight language agents, instantiated as Tox-chat, a toxicity-aware conversational agent. It introduces a hierarchical section retrieval mechanism to enhance context efficiency, integrates scenario-driven dialogue synthesis with database fidelity optimization for effective tool-use knowledge distillation, and employs fine-tuning of an 8B-parameter model to balance performance and deployability under limited compute. Contribution/Results: Experiments demonstrate significant improvements in factual accuracy and human preference scores over baseline and unoptimized models. The framework provides a reproducible, resource-efficient pathway for deploying domain-specialized language agents in low-resource settings.

Technology Category

Application Category

📝 Abstract

Language agents powered by large language models (LLMs) face significant deployment challenges in resource-constrained environments, particularly for specialized domains and less-common languages. This paper presents Tox-chat, a Korean chemical toxicity information agent devised within these limitations. We propose two key innovations: a context-efficient architecture that reduces token consumption through hierarchical section search, and a scenario-based dialogue generation methodology that effectively distills tool-using capabilities from larger models. Experimental evaluations demonstrate that our fine-tuned 8B parameter model substantially outperforms both untuned models and baseline approaches, in terms of DB faithfulness and preference. Our work offers valuable insights for researchers developing domain-specific language agents under practical constraints.

Problem

Research questions and friction points this paper is trying to address.

Addressing deployment challenges of LLM agents in resource-limited settings

Developing Korean chemical toxicity agent with constrained resources

Improving efficiency via hierarchical search and scenario-based dialogue

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical section search reduces token usage

Scenario-based dialogue generation from larger models

Fine-tuned 8B model outperforms baselines significantly

🔎 Similar Papers

An Autonomous Large Language Model Agent for Chemical Literature Data Mining