🤖 AI Summary
This work addresses the challenge of accurately estimating radio-frequency material parameters under limited measurements, where conventional gradient-based inverse ray tracing suffers from high sensitivity to initial conditions and substantial computational cost. To overcome these limitations, the study introduces, for the first time, a vision-language model (VLM) into electromagnetic parameter estimation. By integrating differentiable ray tracing (DRT), the method leverages semantic information from scene images to generate informative material priors and optimize transceiver placement, thereby guiding physical simulation and gradient-based optimization. The proposed approach significantly enhances both convergence speed and estimation accuracy, achieving a 2–4× acceleration in indoor scenarios and reducing parameter errors by 10–100×. Remarkably, it attains an average relative error below 0.1% using only a small number of receivers.
📝 Abstract
Accurate radio-frequency (RF) material parameters are essential for electromagnetic digital twins in 6G systems, yet gradient-based inverse ray tracing (RT) remains sensitive to initialization and costly under limited measurements. This paper proposes a vision-language-model (VLM) guided framework that accelerates and stabilizes multi-material parameter estimation in a differentiable RT (DRT) engine. A VLM parses scene images to infer material categories and maps them to quantitative priors via an ITU-R material table, yielding informed conductivity initializations. The VLM further selects informative transmitter/receiver placements that promote diverse, material-discriminative paths. Starting from these priors, the DRT performs gradient-based refinement using measured received signal strengths. Experiments in NVIDIA Sionna on indoor scenes show 2-4$\times$ faster convergence and 10-100$\times$ lower final parameter error compared with uniform or random initialization and random placement baselines, achieving sub-0.1\% mean relative error with only a few receivers. Complexity analyses indicate per-iteration time scales near-linearly with the number of materials and measurement setups, while VLM-guided placement reduces the measurements required for accurate recovery. Ablations over RT depth and ray counts confirm further accuracy gains without significant per-iteration overhead. Results demonstrate that semantic priors from VLMs effectively guide physics-based optimization for fast and reliable RF material estimation.