Benchmarking and Advancing Large Language Models for Local Life Services

📅 2025-06-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenges of deploying large language models (LLMs) in local life services—such as food delivery, transportation, and home services—including poor real-world adaptability, lack of comprehensive evaluation benchmarks, and high deployment costs, this paper introduces the first end-to-end, domain-specific evaluation benchmark for local life services. We propose a synergistic optimization paradigm integrating lightweight supervised fine-tuning (SFT) with tool-augmented agent workflows, enhanced by domain-aware prompt engineering and service-chain modeling. Key contributions include: (1) the first systematic, multi-dimensional evaluation framework tailored to local life services; (2) achieving state-of-the-art performance on multiple tasks using a 7B model—matching that of a 72B counterpart after optimization; and (3) reducing inference latency by over 90% and improving online response throughput by 3.2×, with successful deployment on a production-scale service platform.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have exhibited remarkable capabilities and achieved significant breakthroughs across various domains, leading to their widespread adoption in recent years. Building on this progress, we investigate their potential in the realm of local life services. In this study, we establish a comprehensive benchmark and systematically evaluate the performance of diverse LLMs across a wide range of tasks relevant to local life services. To further enhance their effectiveness, we explore two key approaches: model fine-tuning and agent-based workflows. Our findings reveal that even a relatively compact 7B model can attain performance levels comparable to a much larger 72B model, effectively balancing inference cost and model capability. This optimization greatly enhances the feasibility and efficiency of deploying LLMs in real-world online services, making them more practical and accessible for local life applications.
Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM performance in local life service tasks
Enhancing LLMs via fine-tuning and agent workflows
Optimizing cost-capability balance for practical deployment
Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for diverse LLMs evaluation
Model fine-tuning and agent-based workflows enhancement
Compact 7B model matches larger 72B performance
🔎 Similar Papers
No similar papers found.
Xiaochong Lan
Xiaochong Lan
Tsinghua University
Large Language ModelsLLM Agent
J
Jie Feng
Department of Electronic Engineering, BNRist, Tsinghua University, Beijing, China
J
Jiahuan Lei
Meituan, Beijing, China
X
Xinlei Shi
Meituan, Beijing, China
Y
Yong Li
Department of Electronic Engineering, BNRist, Tsinghua University, Beijing, China