Conformal Prediction and MLLM aided Uncertainty Quantification in Scene Graph Generation

📅 2025-03-18
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the insufficient reliability of scene graph generation (SGG) caused by long-tailed class distributions and prediction instability, this paper introduces conformal prediction (CP) for the first time in SGG—establishing a plug-and-play uncertainty quantification framework that outputs statistically guaranteed scene graph prediction sets (e.g., with 1−α coverage). We further propose a multimodal large language model (MLLM)-driven semantic-visual joint refinement mechanism, which improves Top-1 accuracy by 3.2% while maintaining ≥90% empirical coverage. Our method is compatible with mainstream SGG models without requiring retraining, supports post-hoc calibration, and enables semantic consistency evaluation. Experiments on Visual Genome (VG) and OpenImages demonstrate significant improvements in reliability, diversity, and interpretability.

Technology Category

Application Category

📝 Abstract
Scene Graph Generation (SGG) aims to represent visual scenes by identifying objects and their pairwise relationships, providing a structured understanding of image content. However, inherent challenges like long-tailed class distributions and prediction variability necessitate uncertainty quantification in SGG for its practical viability. In this paper, we introduce a novel Conformal Prediction (CP) based framework, adaptive to any existing SGG method, for quantifying their predictive uncertainty by constructing well-calibrated prediction sets over their generated scene graphs. These scene graph prediction sets are designed to achieve statistically rigorous coverage guarantees. Additionally, to ensure these prediction sets contain the most practically interpretable scene graphs, we design an effective MLLM-based post-processing strategy for selecting the most visually and semantically plausible scene graphs within these prediction sets. We show that our proposed approach can produce diverse possible scene graphs from an image, assess the reliability of SGG methods, and improve overall SGG performance.
Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in Scene Graph Generation.
Address long-tailed class distributions and prediction variability.
Improve reliability and performance of SGG methods.
Innovation

Methods, ideas, or system contributions that make the work stand out.

Conformal Prediction for uncertainty quantification
Adaptive framework for existing SGG methods
MLLM-based post-processing for plausible scene graphs
🔎 Similar Papers
No similar papers found.