HALO: Human-Aligned End-to-end Image Retargeting with Layered Transformations

📅 2025-04-03

📈 Citations: 0

✨ Influential: 0

career value

203K/year

🤖 AI Summary

Image retargeting aims to preserve content integrity and structural consistency during aspect-ratio adaptation, yet existing methods often introduce visual artifacts or distort semantic structure. To address this, we propose an end-to-end trainable framework featuring a novel hierarchical deformation field grounded in human visual sensitivity: rigid constraints are enforced on salient regions, while non-salient regions undergo adaptive, content-aware warping. We further introduce a perception-driven structural similarity loss (P-SSIM) that jointly optimizes content fidelity, geometric alignment, and perceptual consistency. Our approach integrates hierarchical image decomposition, differentiable deformation modeling, and multi-objective optimization. Evaluated on the RetargetMe benchmark, our method achieves state-of-the-art performance, with an average 18.4% improvement in user preference over the best prior baseline.

Technology Category

Application Category

📝 Abstract

Image retargeting aims to change the aspect-ratio of an image while maintaining its content and structure with less visual artifacts. Existing methods still generate many artifacts or fail to maintain original content or structure. To address this, we introduce HALO, an end-to-end trainable solution for image retargeting. Since humans are more sensitive to distortions in salient areas than non-salient areas of an image, HALO decomposes the input image into salient/non-salient layers and applies different wrapping fields to different layers. To further minimize the structure distortion in the output images, we propose perceptual structure similarity loss which measures the structure similarity between input and output images and aligns with human perception. Both quantitative results and a user study on the RetargetMe dataset show that HALO achieves SOTA. Especially, our method achieves an 18.4% higher user preference compared to the baselines on average.

Problem

Research questions and friction points this paper is trying to address.

Reduces visual artifacts in image retargeting

Preserves content and structure in salient areas

Aligns output with human perception metrics

Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-end trainable image retargeting solution

Layered transformation for salient and non-salient areas

Perceptual structure similarity loss for alignment

🔎 Similar Papers

Cropper: Vision-Language Model for Image Cropping through In-Context Learning