One Patch is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues

📅 2025-11-25

📈 Citations: 0

✨ Influential: 0

career value

223K/year

🤖 AI Summary

Addressing the challenge of surface material reconstruction and classification under extremely sparse visual cues, this paper proposes SMARC—a unified framework that, for the first time, simultaneously achieves RGB material reconstruction and material category recognition using only 10% of contiguous local image regions. The method employs a partial-convolution U-Net backbone coupled with a lightweight classification head, enabling end-to-end joint optimization of spatial inpainting and semantic understanding. Evaluated on the real-world Touch and Go texture dataset, SMARC achieves a PSNR of 17.55 dB and a classification accuracy of 85.10%, significantly outperforming five state-of-the-art models—including ViT, MAE, and Swin Transformer. This work establishes the first efficient and robust single-stage solution for material understanding under severely occluded or restricted-view conditions, with direct implications for robotic perception and simulation-based interactive systems.

Technology Category

Application Category

📝 Abstract

Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation, and material perception. However, most existing methods rely on dense or full-scene observations, limiting their effectiveness in constrained or partial view environment. To address this challenge, we introduce SMARC, a unified model for Surface MAterial Reconstruction and Classification from minimal visual input. By giving only a single 10% contiguous patch of the image, SMARC recognizes and reconstructs the full RGB surface while simultaneously classifying the material category. Our architecture combines a Partial Convolutional U-Net with a classification head, enabling both spatial inpainting and semantic understanding under extreme observation sparsity. We compared SMARC against five models including convolutional autoencoders [17], Vision Transformer (ViT) [13], Masked Autoencoder (MAE) [5], Swin Transformer [9], and DETR [2] using Touch and Go dataset [16] of real-world surface textures. SMARC achieves state-of-the-art results with a PSNR of 17.55 dB and a material classification accuracy of 85.10%. Our findings highlight the advantages of partial convolution in spatial reasoning under missing data and establish a strong foundation for minimal-vision surface understanding.

Problem

Research questions and friction points this paper is trying to address.

Reconstructing full material surfaces from minimal visual patches

Classifying material categories under extreme observation sparsity

Overcoming limitations of dense-scene methods in constrained environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses partial convolutional U-Net for reconstruction

Combines classification head with spatial inpainting

Processes single 10% image patch for full output

🔎 Similar Papers

Learning Zero-Shot Material States Segmentation, by Implanting Natural Image Patterns in Synthetic Data