🤖 AI Summary
To address challenges in advertiser behavior understanding—including difficulty in fusing heterogeneous multimodal data (text, images, video, structured data), high false-positive rates in fraud detection and policy violation identification, and semantic inconsistency in similarity matching—this paper proposes ALF, a multimodal large language model for advertisers. ALF introduces a novel co-attention mechanism that jointly models cross-modal transformations and inter-sample relationships, integrated with spectral-normalized projection and probabilistic calibration to jointly represent content semantics and behavioral patterns. Built upon a multimodal Transformer architecture, ALF incorporates contrastive learning, multi-task joint optimization, and robust feature normalization. It achieves state-of-the-art performance on three core tasks: fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces abuse detection false positives by 90% while maintaining 99.8% precision.
📝 Abstract
We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90% while maintaining 99.8% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, inter-sample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs.