Semantically Stable Image Composition Analysisvia Saliency and Gradient Vector Flow Fusion

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

212K/year

🤖 AI Summary

This work addresses the challenge of simultaneously achieving semantic robustness and high-quality visual layout assessment in image composition evaluation by proposing VFCNet. The method introduces a novel approach that fuses saliency and edge information into a Gradient Vector Flow (GVF) field, leveraging a dual-stream GVF representation, attention-based fusion, and multi-scale features extracted from a DINOv3 self-supervised backbone to construct composition representations robust to low-level semantic variations. Remarkably, even a simple classifier using only DINOv3 features outperforms existing sophisticated task-specific models, achieving state-of-the-art performance on the PICD benchmark with CDA-1 and CDA-2 scores of 0.683 and 0.629, respectively—improvements of 33.1% and 36.1% over the previous best method.

Technology Category

Application Category

📝 Abstract

The reliable computational assessment of photographic composition requires features that are discriminative of spatial layout yet robust to semantic content. This paper proposes a low-level representation grounded in the assumption that composition can be understood as the flow of visual attention across geometric structure. We introduce VFCNet, which fuses saliency and edge information into a gradient vector flow (GVF) field. The model computes dual-stream GVF representations, integrates them via attention, and extracts multi-scale flow features with a DINOv3 backbone. VFCNet achieves state-of-the-art performance on the PICD benchmark (CDA-1: 0.683, CDA-2: 0.629), improving by 33.1\% and 36.1\% over the previous best method. We also show that a simple classifier on self-supervised DINOv3 features substantially outperforms more sophisticated, composition-specialized models. Code is available at https://github.com/ADadras/VFCNet

Problem

Research questions and friction points this paper is trying to address.

image composition

semantic robustness

saliency

gradient vector flow

computational aesthetics

Innovation

Methods, ideas, or system contributions that make the work stand out.

Gradient Vector Flow

Saliency Fusion

Visual Composition