ProxyImg: Towards Highly-Controllable Image Representation via Hierarchical Disentangled Proxy Embedding

📅 2026-02-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing image representation methods struggle to fully disentangle semantics, geometry, and texture while preserving high-fidelity reconstruction, thereby limiting fine-grained controllable editing. This work proposes a hierarchical proxy-embedding parameterized representation that constructs a semantic-aware hierarchical proxy geometry and embeds multi-scale implicit textures into geometry-aware proxy nodes. For the first time, this approach achieves complete disentanglement of the three components within independent parameter spaces, enabling high-quality background completion and physics-driven animation without relying on generative models. By integrating adaptive Bézier fitting, iterative region subdivision, and local feature indexing, the method attains state-of-the-art reconstruction quality on benchmarks including ImageNet, OIR-Bench, and HumanEdit with fewer parameters, while supporting intuitive interaction and real-time animation—significantly outperforming existing generative approaches.

Technology Category

Application Category

📝 Abstract
Prevailing image representation methods, including explicit representations such as raster images and Gaussian primitives, as well as implicit representations such as latent images, either suffer from representation redundancy that leads to heavy manual editing effort, or lack a direct mapping from latent variables to semantic instances or parts, making fine-grained manipulation difficult. These limitations hinder efficient and controllable image and video editing. To address these issues, we propose a hierarchical proxy-based parametric image representation that disentangles semantic, geometric, and textural attributes into independent and manipulable parameter spaces. Based on a semantic-aware decomposition of the input image, our representation constructs hierarchical proxy geometries through adaptive Bezier fitting and iterative internal region subdivision and meshing. Multi-scale implicit texture parameters are embedded into the resulting geometry-aware distributed proxy nodes, enabling continuous high-fidelity reconstruction in the pixel domain and instance- or part-independent semantic editing. In addition, we introduce a locality-adaptive feature indexing mechanism to ensure spatial texture coherence, which further supports high-quality background completion without relying on generative models. Extensive experiments on image reconstruction and editing benchmarks, including ImageNet, OIR-Bench, and HumanEdit, demonstrate that our method achieves state-of-the-art rendering fidelity with significantly fewer parameters, while enabling intuitive, interactive, and physically plausible manipulation. Moreover, by integrating proxy nodes with Position-Based Dynamics, our framework supports real-time physics-driven animation using lightweight implicit rendering, achieving superior temporal consistency and visual realism compared with generative approaches.
Problem

Research questions and friction points this paper is trying to address.

image representation
controllable editing
semantic disentanglement
fine-grained manipulation
representation redundancy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical Disentanglement
Proxy Embedding
Geometry-Aware Representation
Implicit Texture Parameterization
Physics-Driven Animation
🔎 Similar Papers
No similar papers found.
Ye Chen
Ye Chen
Shanghai Jiao Tong University
Image GenerationImage EditingVideo Editing3D VisionGraphics
Y
Yupeng Zhu
Shanghai Jiao Tong University, Shanghai 200240, China
X
Xiongzhen Zhang
Shanghai Jiao Tong University, Shanghai 200240, China
Z
Zhewen Wan
Shanghai Jiao Tong University, Shanghai 200240, China
Yingzhe Li
Yingzhe Li
Samsung Research America
Wireless CommunicationsStochastic Geometry5GLTEWi-Fi
Wenjun Zhang
Wenjun Zhang
City University of Hong Kong
Thin film technologynanomaterials and nanodevices
B
Bingbing Ni
Shanghai Jiao Tong University, Shanghai 200240, China