Biologically-Informed Hybrid Membership Inference Attacks on Generative Genomic Models

📅 2025-11-10

📈 Citations: 0

✨ Influential: 0

career value

230K/year

🤖 AI Summary

Genomic data sensitivity poses significant privacy challenges. This paper proposes a differential privacy (DP)-enhanced language modeling framework for generating high-fidelity synthetic genomic mutation profiles. It employs a GPT-like Transformer architecture to model variant sequences and injects calibrated DP noise during training to provably protect individual-level privacy. To rigorously assess privacy leakage, we introduce a bioinformatics-guided hybrid membership inference attack—integrating black-box querying with genome-specific biological metrics (e.g., allele frequency, functional annotations)—thereby substantially improving attack efficacy against generative genomic models. Experiments on small-scale genomic datasets demonstrate that our framework achieves a favorable privacy–utility trade-off: it preserves statistical and biological fidelity of synthetic variants while providing formal DP guarantees. Our attack method outperforms conventional baselines by an average of 23.6% in membership inference success rate. This work establishes a new benchmark and practical evaluation toolkit for privacy-preserving synthetic genomics.

Technology Category

Application Category

📝 Abstract

The increased availability of genetic data has transformed genomics research, but raised many privacy concerns regarding its handling due to its sensitive nature. This work explores the use of language models (LMs) for the generation of synthetic genetic mutation profiles, leveraging differential privacy (DP) for the protection of sensitive genetic data. We empirically evaluate the privacy guarantees of our DP modes by introducing a novel Biologically-Informed Hybrid Membership Inference Attack (biHMIA), which combines traditional black box MIA with contextual genomics metrics for enhanced attack power. Our experiments show that both small and large transformer GPT-like models are viable synthetic variant generators for small-scale genomics, and that our hybrid attack leads, on average, to higher adversarial success compared to traditional metric-based MIAs.

Problem

Research questions and friction points this paper is trying to address.

Developing privacy-preserving synthetic genetic mutation profiles using language models

Evaluating differential privacy guarantees against novel biologically-informed hybrid attacks

Assessing membership inference risks for transformer-based genomic data generators

Innovation

Methods, ideas, or system contributions that make the work stand out.

Leveraging differential privacy for genetic data protection

Introducing hybrid membership inference attack with genomics metrics

Using transformer models as synthetic genetic variant generators

🔎 Similar Papers

Is My Data in Your Retrieval Database? Membership Inference Attacks Against Retrieval Augmented Generation