ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding

📅 2025-04-26
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address challenges in advertiser behavior understanding—including difficulty in fusing heterogeneous multimodal data (text, images, video, structured data), high false-positive rates in fraud detection and policy violation identification, and semantic inconsistency in similarity matching—this paper proposes ALF, a multimodal large language model for advertisers. ALF introduces a novel co-attention mechanism that jointly models cross-modal transformations and inter-sample relationships, integrated with spectral-normalized projection and probabilistic calibration to jointly represent content semantics and behavioral patterns. Built upon a multimodal Transformer architecture, ALF incorporates contrastive learning, multi-task joint optimization, and robust feature normalization. It achieves state-of-the-art performance on three core tasks: fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces abuse detection false positives by 90% while maintaining 99.8% precision.

Technology Category

Application Category

📝 Abstract
We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90% while maintaining 99.8% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, inter-sample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs.
Problem

Research questions and friction points this paper is trying to address.

Understands advertiser behavior across text, image, video, and structured data
Detects fraud and policy violations with high precision and reduced false positives
Creates unified advertiser representations using multi-modal contrastive learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal transformer for advertiser understanding
Contrastive learning for unified representations
Spectrally normalized projections enhance performance
🔎 Similar Papers
No similar papers found.
S
Santosh Rajagopalan
Google
J
Jonathan Vronsky
Google
S
Songbai Yan
Google
S. Alireza Golestaneh
S. Alireza Golestaneh
Google
Computer VisionMachine LearningVisual PerceptionHuman Vision
S
Shubhra Chandra
Google
M
Min Zhou
Google