One Patch is All You Need: Joint Surface Material Reconstruction and Classification from Minimal Visual Cues

📅 2025-11-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Addressing the challenge of surface material reconstruction and classification under extremely sparse visual cues, this paper proposes SMARC—a unified framework that, for the first time, simultaneously achieves RGB material reconstruction and material category recognition using only 10% of contiguous local image regions. The method employs a partial-convolution U-Net backbone coupled with a lightweight classification head, enabling end-to-end joint optimization of spatial inpainting and semantic understanding. Evaluated on the real-world Touch and Go texture dataset, SMARC achieves a PSNR of 17.55 dB and a classification accuracy of 85.10%, significantly outperforming five state-of-the-art models—including ViT, MAE, and Swin Transformer. This work establishes the first efficient and robust single-stage solution for material understanding under severely occluded or restricted-view conditions, with direct implications for robotic perception and simulation-based interactive systems.

Technology Category

Application Category

📝 Abstract
Understanding material surfaces from sparse visual cues is critical for applications in robotics, simulation, and material perception. However, most existing methods rely on dense or full-scene observations, limiting their effectiveness in constrained or partial view environment. To address this challenge, we introduce SMARC, a unified model for Surface MAterial Reconstruction and Classification from minimal visual input. By giving only a single 10% contiguous patch of the image, SMARC recognizes and reconstructs the full RGB surface while simultaneously classifying the material category. Our architecture combines a Partial Convolutional U-Net with a classification head, enabling both spatial inpainting and semantic understanding under extreme observation sparsity. We compared SMARC against five models including convolutional autoencoders [17], Vision Transformer (ViT) [13], Masked Autoencoder (MAE) [5], Swin Transformer [9], and DETR [2] using Touch and Go dataset [16] of real-world surface textures. SMARC achieves state-of-the-art results with a PSNR of 17.55 dB and a material classification accuracy of 85.10%. Our findings highlight the advantages of partial convolution in spatial reasoning under missing data and establish a strong foundation for minimal-vision surface understanding.
Problem

Research questions and friction points this paper is trying to address.

Reconstructing full material surfaces from minimal visual patches
Classifying material categories under extreme observation sparsity
Overcoming limitations of dense-scene methods in constrained environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses partial convolutional U-Net for reconstruction
Combines classification head with spatial inpainting
Processes single 10% image patch for full output
🔎 Similar Papers
No similar papers found.
S
Sindhuja Penchala
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
G
Gavin Money
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
G
Gabriel Marques
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
S
Samuel Wood
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
J
Jessica Kirschman
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
T
Travis Atkison
Department of Computer Science, The University of Alabama, Tuscaloosa, AL, USA
Shahram Rahimi
Shahram Rahimi
Department Head and Professor, The University of Alabama
Computational IntelligenceAIAgentic AIKnowledge-Based Systems
Noorbakhsh Amiri Golilarz
Noorbakhsh Amiri Golilarz
Assistant Professor at The University of Alabama
AI/Deep LearningCognitive NeuroscienceComputer VisionImage ProcessingHyperspectral Imaging