🤖 AI Summary
This work addresses the challenge of accurate geometry and texture reconstruction in 3D vision tasks caused by soft boundaries—such as fine hair—where foreground and background pixels are intricately mixed. To tackle this issue, the authors propose HairGuard, a novel framework that leverages image matting data to construct the first dedicated soft-boundary training set. The method introduces a plug-and-play gated residual depth inpainting module and integrates a generative scene renderer with forward warping and adaptive color blending strategies to jointly optimize monocular depth estimation, stereo conversion, and novel view synthesis. Extensive experiments demonstrate that HairGuard achieves state-of-the-art performance across multiple 3D tasks, significantly improving depth accuracy and detail fidelity in soft-boundary regions.
📝 Abstract
Soft boundaries, like thin hairs, are commonly observed in natural and computer-generated imagery, but they remain challenging for 3D vision due to the ambiguous mixing of foreground and background cues. This paper introduces Guardians of the Hair (HairGuard), a framework designed to recover fine-grained soft boundary details in 3D vision tasks. Specifically, we first propose a novel data curation pipeline that leverages image matting datasets for training and design a depth fixer network to automatically identify soft boundary regions. With a gated residual module, the depth fixer refines depth precisely around soft boundaries while maintaining global depth quality, allowing plug-and-play integration with state-of-the-art depth models. For view synthesis, we perform depth-based forward warping to retain high-fidelity textures, followed by a generative scene painter that fills disoccluded regions and eliminates redundant background artifacts within soft boundaries. Finally, a color fuser adaptively combines warped and inpainted results to produce novel views with consistent geometry and fine-grained details. Extensive experiments demonstrate that HairGuard achieves state-of-the-art performance across monocular depth estimation, stereo image/video conversion, and novel view synthesis, with significant improvements in soft boundary regions.