🤖 AI Summary
Addressing the challenges of deploying large language models (LLMs) in local life services—such as food delivery, transportation, and home services—including poor real-world adaptability, lack of comprehensive evaluation benchmarks, and high deployment costs, this paper introduces the first end-to-end, domain-specific evaluation benchmark for local life services. We propose a synergistic optimization paradigm integrating lightweight supervised fine-tuning (SFT) with tool-augmented agent workflows, enhanced by domain-aware prompt engineering and service-chain modeling. Key contributions include: (1) the first systematic, multi-dimensional evaluation framework tailored to local life services; (2) achieving state-of-the-art performance on multiple tasks using a 7B model—matching that of a 72B counterpart after optimization; and (3) reducing inference latency by over 90% and improving online response throughput by 3.2×, with successful deployment on a production-scale service platform.
📝 Abstract
Large language models (LLMs) have exhibited remarkable capabilities and achieved significant breakthroughs across various domains, leading to their widespread adoption in recent years. Building on this progress, we investigate their potential in the realm of local life services. In this study, we establish a comprehensive benchmark and systematically evaluate the performance of diverse LLMs across a wide range of tasks relevant to local life services. To further enhance their effectiveness, we explore two key approaches: model fine-tuning and agent-based workflows. Our findings reveal that even a relatively compact 7B model can attain performance levels comparable to a much larger 72B model, effectively balancing inference cost and model capability. This optimization greatly enhances the feasibility and efficiency of deploying LLMs in real-world online services, making them more practical and accessible for local life applications.