TubiFM: Unified Item, Carousel, and Search Ranking for Streaming Discovery

๐Ÿ“… 2026-05-22
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

202K/year
๐Ÿค– AI Summary
This work addresses the limitation of conventional streaming recommendation systems, which typically employ separate models for item recommendation, autoplay carousels, and search, thereby failing to leverage complementary signals across user interaction contexts. To overcome this, the authors propose a unified โ€œuser storyโ€ representation that encodes cross-interface user behaviors into a token sequence integrating natural language and domain-specific events. Building upon Llama 3.2 1B, they introduce TubiFMโ€”a prompt-driven, unified model that performs multi-task joint ranking via next-token prediction, reframing heterogeneous recommendation and search tasks as a single sequence modeling problem under a shared syntax without task-specific architectures. Experiments demonstrate that TubiFM consistently outperforms specialized baselines in offline metrics; online A/B tests show a 3.9% increase in total watch time from search, a 0.30% gain in autoplay engagement, parity with established systems in item ranking quality, and a reduction in p99 latency from 500ms to 200ms.
๐Ÿ“ Abstract
Personalized discovery systems often train separate models for item ranking, carousel ranking, and search, even though these tasks expose complementary signals from the same viewer journey: watches shape carousel and item ranking, search queries reveal intent even when they do not lead to a catalog match, and watch history helps interpret search as rewatching, continuation, or new discovery. We introduce the user story, a serialized representation that turns a user's cross-surface history - attributes, sessions, watch events with surface and carousel context, and search events - into a single token sequence. By interleaving pretrained language tokens with domain-specific event tokens, user stories let heterogeneous recommendation and search tasks be expressed as prompted next-token prediction over a shared grammar. TubiFM is one instantiation of this approach: a Llama 3.2 1B-based model trained on user stories and prompted to rank items, carousels, or search results without task-specific architectures. In offline evaluation, this single model outperforms specialist baselines across item, carousel, and search ranking. In online A/B tests, TubiFM significantly improves search total viewing time (TVT) by $+3.9\%$ and carousel TVT by $+0.30\%$. Item ranking is statistically neutral on TVT ($+0.14\%$), but matches a mature production stack; across all three tasks, TubiFM serves on L40S GPUs and reduces p99 ranking latency from 500ms to 200ms. These results show that shared user stories can improve discovery while simplifying ranking systems.
Problem

Research questions and friction points this paper is trying to address.

personalized discovery
item ranking
carousel ranking
search ranking
cross-task signals
Innovation

Methods, ideas, or system contributions that make the work stand out.

user story
unified ranking
next-token prediction
cross-surface recommendation
TubiFM
๐Ÿ”Ž Similar Papers