🤖 AI Summary
Large language models (LLMs) struggle to comprehend complex, policy-driven business logic and procedural workflows in Airbnb’s customer support domain. Method: We propose the Intent-Context-Action (ICA) knowledge representation framework, which explicitly encodes unstructured support logic into a structured, reasoning-ready ternary schema; further, we design a hybrid rule-and-model-driven synthetic data generation pipeline enabling low-cost, privacy-preserving supervised fine-tuning (SFT) without real user data. Contribution/Results: Our approach significantly improves LLMs’ task understanding accuracy and execution reliability in customer support. Experiments show a 23.6% absolute gain in intent classification accuracy, a 41% reduction in human intervention rate, and a 35% decrease in average handling time versus baseline models. To our knowledge, this is the first systematic application of the ICA paradigm to production-scale customer-support LLM deployment, establishing a scalable, cost-effective adaptation framework for LLMs in highly regulated, process-intensive domains.
📝 Abstract
We propose a practical approach by integrating Large Language Models (LLMs) with a framework designed to navigate the complexities of Airbnb customer support operations. In this paper, our methodology employs a novel reformatting technique, the Intent, Context, and Action (ICA) format, which transforms policies and workflows into a structure more comprehensible to LLMs. Additionally, we develop a synthetic data generation strategy to create training data with minimal human intervention, enabling cost-effective fine-tuning of our model. Our internal experiments (not applied to Airbnb products) demonstrate that our approach of restructuring workflows and fine-tuning LLMs with synthetic data significantly enhances their performance, setting a new benchmark for their application in customer support. Our solution is not only cost-effective but also improves customer support, as evidenced by both accuracy and manual processing time evaluation metrics.