Distilling Stereo Networks for Performant and Efficient Leaner Networks

📅 2023-06-18

🏛️ IEEE International Joint Conference on Neural Network

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

To address the high redundancy and slow inference of stereo matching models, this paper proposes the first systematic knowledge distillation framework tailored for stereo vision. Methodologically, it introduces end-to-end knowledge distillation to the field for the first time, incorporating backbone alignment, multi-scale feature point selection, and a customized loss function. By synergistically leveraging 2D and 3D convolutional characteristics, the framework enables joint distillation at both the feature map and cost volume levels. It further integrates an adaptive-weight loss and a progressive teacher–student co-training strategy. Experiments demonstrate that the distilled student network achieves superior accuracy over PSMNet, CFNet, and LEAStereo on SceneFlow, with inference speedups of 8×, 5×, and 8×, respectively. Moreover, it exhibits stronger generalization on ETH3D and Middlebury, while maintaining inference latency under 100 ms across all benchmarks—effectively balancing accuracy and efficiency.

Technology Category

Application Category

📝 Abstract

Knowledge distillation has been quite popular in vision for tasks like classification and segmentation however not much work has been done for distilling state-of-the-art stereo matching methods despite their range of applications. One of the reasons for its lack of use in stereo matching networks is due to the inherent complexity of these networks, where a typical network is composed of multiple two- and three-dimensional modules. In this work, we systematically combine the insights from state-of-the-art stereo methods with general knowledge-distillation techniques to develop a joint framework for stereo networks distillation with competitive results and faster inference. Moreover, we show, via a detailed empirical analysis, that distilling knowledge from the stereo network requires careful design of the complete distillation pipeline starting from backbone to the right selection of distillation points and corresponding loss functions. This results in the student networks that are not only leaner and faster but give excellent performance. For instance, our student network while performing better than the performance oriented methods like PSMNet [1], CFNet [2], and LEAStereo [3]) on benchmark SceneFlow dataset, is 8 x, 5 x, and 8 x faster respectively. Furthermore, compared to speed oriented methods having inference time less than 100ms, our student networks perform better than all the tested methods. In addition, our student network also shows better generalization capabilities when tested on unseen datasets like ETH3D and Middlebury11Code: https://github.com/cogsys-tuebingen/Distilling-Stereo-Networks.

Problem

Research questions and friction points this paper is trying to address.

Distilling complex stereo networks into leaner, faster models

Designing effective distillation pipelines for stereo matching tasks

Improving performance and speed in stereo network distillation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Combines stereo methods with knowledge-distillation techniques

Careful design of distillation pipeline and loss functions

Produces leaner, faster, and high-performance student networks

🔎 Similar Papers

No similar papers found.