🤖 AI Summary
This work addresses the asymmetric discrepancy between offline metrics and online performance in recommendation ranking, which cannot be adequately corrected by a single calibration factor. The authors propose a dual-channel SEU framework that models ranking optimization as a continuous influence-exchange process, enabling an end-to-end autonomous loop from diagnosis to deployment. One channel decouples offline-to-online calibration, while the other dynamically adjusts constraint penalties. A large language model (LLM) meta-controller governs high-level parameters, supported by a memory database comprising seven relational tables to facilitate cross-iteration learning. Evaluated in two Southeast Asian markets, the approach improved GMV in Market A from −3.6% to +9.2% within seven iterations, with peak orders increasing by 12.5%; in Market B, it achieved +4.15% GMV per unique user and +3.58% ad revenue within seven days of cold start and has since been fully deployed.
📝 Abstract
Recommendation ranking is fundamentally an influence allocation problem: a sorting formula distributes ranking influence among competing factors, and the business outcome depends on finding the optimal "exchange rates" among them. However, offline proxy metrics systematically misjudge how influence reallocation translates to online impact, with asymmetric bias across metrics that a single calibration factor cannot correct.
We present Sortify, the first fully autonomous LLM-driven ranking optimization agent deployed in a large-scale production recommendation system. The agent reframes ranking optimization as continuous influence exchange, closing the full loop from diagnosis to parameter deployment without human intervention. It addresses structural problems through three mechanisms: (1) a dual-channel framework grounded in Savage's Subjective Expected Utility (SEU) that decouples offline-online transfer correction (Belief channel) from constraint penalty adjustment (Preference channel); (2) an LLM meta-controller operating on framework-level parameters rather than low-level search variables; (3) a persistent Memory DB with 7 relational tables for cross-round learning. Its core metric, Influence Share, provides a decomposable measure where all factor contributions sum to exactly 100%.
Sortify has been deployed across two Southeast Asian markets. In Country A, the agent pushed GMV from -3.6% to +9.2% within 7 rounds with peak orders reaching +12.5%. In Country B, a cold-start deployment achieved +4.15% GMV/UU and +3.58% Ads Revenue in a 7-day A/B test, leading to full production rollout.