Benchmarking and Advancing Large Language Models for Local Life Services

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

Addressing the challenges of deploying large language models (LLMs) in local life services—such as food delivery, transportation, and home services—including poor real-world adaptability, lack of comprehensive evaluation benchmarks, and high deployment costs, this paper introduces the first end-to-end, domain-specific evaluation benchmark for local life services. We propose a synergistic optimization paradigm integrating lightweight supervised fine-tuning (SFT) with tool-augmented agent workflows, enhanced by domain-aware prompt engineering and service-chain modeling. Key contributions include: (1) the first systematic, multi-dimensional evaluation framework tailored to local life services; (2) achieving state-of-the-art performance on multiple tasks using a 7B model—matching that of a 72B counterpart after optimization; and (3) reducing inference latency by over 90% and improving online response throughput by 3.2×, with successful deployment on a production-scale service platform.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) have exhibited remarkable capabilities and achieved significant breakthroughs across various domains, leading to their widespread adoption in recent years. Building on this progress, we investigate their potential in the realm of local life services. In this study, we establish a comprehensive benchmark and systematically evaluate the performance of diverse LLMs across a wide range of tasks relevant to local life services. To further enhance their effectiveness, we explore two key approaches: model fine-tuning and agent-based workflows. Our findings reveal that even a relatively compact 7B model can attain performance levels comparable to a much larger 72B model, effectively balancing inference cost and model capability. This optimization greatly enhances the feasibility and efficiency of deploying LLMs in real-world online services, making them more practical and accessible for local life applications.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM performance in local life service tasks

Enhancing LLMs via fine-tuning and agent workflows

Optimizing cost-capability balance for practical deployment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Comprehensive benchmark for diverse LLMs evaluation

Model fine-tuning and agent-based workflows enhancement

Compact 7B model matches larger 72B performance

🔎 Similar Papers

CityBench: Evaluating the Capabilities of Large Language Models for Urban Tasks