FSKD: Monocular Forest Structure Inference via LiDAR-to-RGBI Knowledge Distillation

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Airborne LiDAR is costly and sparsely sampled, limiting its utility for large-scale, individual-tree-level forest structure monitoring. To address this, this work proposes the FSKD framework—the first cross-modal knowledge distillation approach from LiDAR to RGBI imagery. A multimodal teacher model, integrating RGBI and LiDAR-derived features via cross-attention, guides a lightweight SegFormer student model to jointly predict canopy height model (CHM), plant area index (PAI), and foliage height diversity (FHD) using only RGBI inputs. This method overcomes the limitation of existing monocular approaches that estimate only CHM and does not require strict temporal synchronization between RGBI and LiDAR data. Evaluated across a 384 km² region in Saxony, Germany, the approach achieves state-of-the-art zero-shot CHM prediction performance (MedAE: 4.17 m, R²: 0.51, IoU: 0.87), with MAE improvements of 29–46% over baselines, and demonstrates robust generalization across seasonal (winter–summer) data shifts.
📝 Abstract
Very High Resolution (VHR) forest structure data at individual-tree scale is essential for carbon, biodiversity, and ecosystem monitoring. Still, airborne LiDAR remains costly and infrequent despite being the reference for forest structure metrics like Canopy Height Model (CHM), Plant Area Index (PAI), and Foliage Height Diversity (FHD). We propose FSKD: a LiDAR-to-RGB-Infrared (RGBI) knowledge distillation (KD) framework in which a multi-modal teacher fuses RGBI imagery with LiDAR-derived planar metrics and vertical profiles via cross-attention, and an RGBI-only SegFormer student learns to reproduce these outputs. Trained on 384 $km^2$ of forests in Saxony, Germany (20 cm ground sampling distance (GSD)) and evaluated on eight geographically distinct test tiles, the student achieves state-of-the-art (SOTA) zero-shot CHM performance (MedAE 4.17 m, $R^2$=0.51, IoU 0.87), outperforming HRCHM/DAC baselines by 29--46% in MAE (5.81 m vs. 8.14--10.84 m) with stronger correlation coefficients (0.713 vs. 0.166--0.652). Ablations show that multi-modal fusion improves performance by 10--26% over RGBI-only training, and that asymmetric distillation with appropriate model capacity is critical. The method jointly predicts CHM, PAI, and FHD, a multi-metric capability not provided by current monocular CHM estimators, although PAI/FHD transfer remains region-dependent and benefits from local calibration. The framework also remains effective under temporal mismatch (winter LiDAR, summer RGBI), removing strict co-acquisition constraints and enabling scalable 20 cm operational monitoring for workflows such as Digital Twin Germany and national Digital Orthophoto programs.
Problem

Research questions and friction points this paper is trying to address.

Forest Structure Inference
Monocular RGBI
LiDAR-to-RGBI Knowledge Distillation
Canopy Height Model
Very High Resolution
Innovation

Methods, ideas, or system contributions that make the work stand out.

knowledge distillation
monocular forest structure inference
multi-modal fusion
RGBI imagery
zero-shot prediction
T
Taimur Khan
Helmholtz Centre for Environmental Research – UFZ, Halle (Saale), Germany
Hannes Feilhauer
Hannes Feilhauer
Leipzig University, Remote Sensing Centre for Earth System Research
Remote sensing of vegetation
M
Muhammad Jazib Zafar
Georg-August University of Göttingen, Göttingen, Germany