CLIDD: Cross-Layer Independent Deformable Description for Efficient and Discriminative Local Feature Representation

📅 2026-01-14

📈 Citations: 0

✨ Influential: 0

career value

239K/year

🤖 AI Summary

This work proposes a cross-layer independent deformable descriptor mechanism to achieve efficient and highly discriminative local feature representations for real-time spatial intelligence tasks. By learning adaptive offsets to directly sample fine-grained structures from multi-scale feature layers, the method avoids the computational overhead of dense, uniform representations. It integrates hardware-aware kernel fusion and a lightweight architecture, further enhanced by metric learning and knowledge distillation to form a scalable model family. The approach achieves an excellent trade-off between accuracy and efficiency: its ultra-compact variant uses only 0.004M parameters while matching SuperPoint’s performance with a 99.7% reduction in model size; the high-performance variant surpasses state-of-the-art methods such as DINOv2, achieving over 200 FPS on edge devices.

Technology Category

Application Category

📝 Abstract

Robust local feature representations are essential for spatial intelligence tasks such as robot navigation and augmented reality. Establishing reliable correspondences requires descriptors that provide both high discriminative power and computational efficiency. To address this, we introduce Cross-Layer Independent Deformable Description (CLIDD), a method that achieves superior distinctiveness by sampling directly from independent feature hierarchies. This approach utilizes learnable offsets to capture fine-grained structural details across scales while bypassing the computational burden of unified dense representations. To ensure real-time performance, we implement a hardware-aware kernel fusion strategy that maximizes inference throughput. Furthermore, we develop a scalable framework that integrates lightweight architectures with a training protocol leveraging both metric learning and knowledge distillation. This scheme generates a wide spectrum of model variants optimized for diverse deployment constraints. Extensive evaluations demonstrate that our approach achieves superior matching accuracy and exceptional computational efficiency simultaneously. Specifically, the ultra-compact variant matches the precision of SuperPoint while utilizing only 0.004M parameters, achieving a 99.7% reduction in model size. Furthermore, our high-performance configuration outperforms all current state-of-the-art methods, including high-capacity DINOv2-based frameworks, while exceeding 200 FPS on edge devices. These results demonstrate that CLIDD delivers high-precision local feature matching with minimal computational overhead, providing a robust and scalable solution for real-time spatial intelligence tasks.

Problem

Research questions and friction points this paper is trying to address.

local feature representation

computational efficiency

discriminative power

real-time performance

spatial intelligence

Innovation

Methods, ideas, or system contributions that make the work stand out.

Cross-Layer Independent Deformable Description

learnable offsets

hardware-aware kernel fusion