ALF: Advertiser Large Foundation Model for Multi-Modal Advertiser Understanding

📅 2025-04-26

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address challenges in advertiser behavior understanding—including difficulty in fusing heterogeneous multimodal data (text, images, video, structured data), high false-positive rates in fraud detection and policy violation identification, and semantic inconsistency in similarity matching—this paper proposes ALF, a multimodal large language model for advertisers. ALF introduces a novel co-attention mechanism that jointly models cross-modal transformations and inter-sample relationships, integrated with spectral-normalized projection and probabilistic calibration to jointly represent content semantics and behavioral patterns. Built upon a multimodal Transformer architecture, ALF incorporates contrastive learning, multi-task joint optimization, and robust feature normalization. It achieves state-of-the-art performance on three core tasks: fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces abuse detection false positives by 90% while maintaining 99.8% precision.

Technology Category

Application Category

📝 Abstract

We present ALF (Advertiser Large Foundation model), a multi-modal transformer architecture for understanding advertiser behavior and intent across text, image, video and structured data modalities. Through contrastive learning and multi-task optimization, ALF creates unified advertiser representations that capture both content and behavioral patterns. Our model achieves state-of-the-art performance on critical tasks including fraud detection, policy violation identification, and advertiser similarity matching. In production deployment, ALF reduces false positives by 90% while maintaining 99.8% precision on abuse detection tasks. The architecture's effectiveness stems from its novel combination of multi-modal transformations, inter-sample attention mechanism, spectrally normalized projections, and calibrated probabilistic outputs.

Problem

Research questions and friction points this paper is trying to address.

Understands advertiser behavior across text, image, video, and structured data

Detects fraud and policy violations with high precision and reduced false positives

Creates unified advertiser representations using multi-modal contrastive learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Multi-modal transformer for advertiser understanding

Contrastive learning for unified representations

Spectrally normalized projections enhance performance

🔎 Similar Papers

No similar papers found.