NeurIPS 2023 Competition: Privacy Preserving Federated Learning Document VQA

📅 2024-11-06
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenge of simultaneously ensuring privacy preservation and communication efficiency in real-world invoice processing, this paper introduces the first federated learning benchmark platform for Document Visual Question Answering (DocVQA), integrating document analysis, federated learning, and differential privacy. We propose a dual-track framework: Track 1 employs FedAvg with gradient compression and adaptive noise injection to maintain model utility under low communication overhead; Track 2 pioneers end-to-end document-level differential privacy for DocVQA, delivering provable privacy guarantees via fine-tuning of multimodal generative language models and optimized privacy budget allocation. Our contributions include a reproducible benchmark, a standardized paradigm for organizing federated privacy challenges, and foundational guidance toward establishing dual-dimensional (model- and data-centric) privacy practices in document image analysis.

Technology Category

Application Category

📝 Abstract
The Privacy Preserving Federated Learning Document VQA (PFL-DocVQA) competition challenged the community to develop provably private and communication-efficient solutions in a federated setting for a real-life use case: invoice processing. The competition introduced a dataset of real invoice documents, along with associated questions and answers requiring information extraction and reasoning over the document images. Thereby, it brings together researchers and expertise from the document analysis, privacy, and federated learning communities. Participants fine-tuned a pre-trained, state-of-the-art Document Visual Question Answering model provided by the organizers for this new domain, mimicking a typical federated invoice processing setup. The base model is a multi-modal generative language model, and sensitive information could be exposed through either the visual or textual input modality. Participants proposed elegant solutions to reduce communication costs while maintaining a minimum utility threshold in track 1 and to protect all information from each document provider using differential privacy in track 2. The competition served as a new testbed for developing and testing private federated learning methods, simultaneously raising awareness about privacy within the document image analysis and recognition community. Ultimately, the competition analysis provides best practices and recommendations for successfully running privacy-focused federated learning challenges in the future.
Problem

Research questions and friction points this paper is trying to address.

Develop private federated learning solutions for invoice processing
Extract and reason over document images with privacy constraints
Reduce communication costs while maintaining utility in federated learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fine-tuned pre-trained Document VQA model
Reduced communication costs in federated learning
Applied differential privacy for document protection
🔎 Similar Papers
No similar papers found.
Marlon Tobaben
Marlon Tobaben
PhD student, University of Helsinki
Machine LearningDeep LearningPrivacy
Mohamed Ali Souibgui
Mohamed Ali Souibgui
Computer Vision Center, Universitat Autònoma de Barcelona
Artificial IntelligenceMachine LearningComputer VisionDocument AnalysisVision and Language
R
Rubèn Pérez Tito
Computer Vision Center, Universitat Autònoma de Barcelona
K
Khanh Nguyen
Computer Vision Center, Universitat Autònoma de Barcelona
Raouf Kerkouche
Raouf Kerkouche
Postdoctoral Researcher at CISPA – Helmholtz Center for Information Security
Trustworthy AIPrivacySecurityMachine Learning
Kangsoo Jung
Kangsoo Jung
Postdoctoral Researcher, INRIA
Differential PrivacyGame TheoryMachine Learning
J
Joonas Jalko
University of Helsinki
L
Lei Kang
Computer Vision Center, Universitat Autònoma de Barcelona
Andrey Barsky
Andrey Barsky
Computer Vision Center, Universitat Autònoma de Barcelona
V
V. P. d'Andecy
Yooz
A
Aurélie Joseph
Yooz
Aashiq Muhamed
Aashiq Muhamed
Machine Learning Department, Carnegie Mellon University
Machine learningDeep learning
Kevin Kuo
Kevin Kuo
Carnegie Mellon University
machine learning
Virginia Smith
Virginia Smith
Carnegie Mellon University
Machine LearningOptimizationDistributed Systems
Y
Yusuke Yamasaki
NTT
T
Takumi Fukami
NTT
Kenta Niwa
Kenta Niwa
NTT Communication Science Laboratories, NTT Computer and Data Science Laboratories
Machine LearningDistributed OptimizationDistributed systemSignal ProcessingAcoustic/Speech Signal Processing
I
Iifan Tyou
NTT
H
Hiro Ishii
Tokyo Institute of Technology
Rio Yokota
Rio Yokota
Professor, Institute of Science Tokyo
high performance computinglarge scale deep learninghierarchical low-rank matricesGPU computing
N
N. Ragul
Department of Computer Science, Ashoka University
R
Rintu Kutum
Department of Computer Science, Ashoka University
J
J. Lladós
Computer Vision Center, Universitat Autònoma de Barcelona
Ernest Valveny
Ernest Valveny
Computer Vision Center - Universitat Autònoma de Barcelona
Antti Honkela
Antti Honkela
Professor, University of Helsinki
Machine LearningDifferential PrivacyBayesian InferenceBioinformatics#UnivHelsinkiCS
Mario Fritz
Mario Fritz
Faculty CISPA Helmholtz Center for Information Security; Professor Saarland University
Computer VisionMachine LearningTrustworthy AISecurityPrivacy
Dimosthenis Karatzas
Dimosthenis Karatzas
Computer Vision Center, Universitat Autónoma de Barcelona
computer visiondocument analysisvision and languagereading systems