Weather-Robust Scene Semantics with Vision-Aligned 4D Radar

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

185K/year

🤖 AI Summary

This work addresses the severe performance degradation of cameras and LiDAR under adverse weather conditions such as rain, fog, and snow by proposing a weather-robust semantic perception method based on 4D millimeter-wave radar. The approach aligns a lightweight radar encoder—containing only approximately 7 million trainable parameters—with frozen SigLIP visual embeddings and introduces a LayerNorm projection layer to mitigate token-norm mismatch between radar features and vision-language models (VLMs), thereby enabling the first effective fusion of radar signals with a frozen VLM. Combined with a tailored pooling strategy and structured captioning format, the method significantly outperforms camera-based baselines—which exhibit hallucination rates exceeding 90%—on fog, light snow, and heavy snow scenarios in the K-RADAR dataset, demonstrating its robustness and efficacy.

📝 Abstract

Cameras and LiDAR degrade in rain, fog, and snow, while millimeter-wave radar remains largely unaffected. We align a radar encoder to frozen SigLIP vision embeddings and decode structured scene captions through a frozen vision-language model (VLM) with approximately 7M trainable parameters. On K-RADAR with held-out fog, light snow, and heavy snow sequences, all radar configurations outperform a camera baseline that collapses to over 90% hallucination. We identify a token-norm mismatch as the dominant failure mode when bridging radar to a frozen VLM and show that projector-output LayerNorm resolves it. Analysis of encoder complexity, caption format, and pooling strategy reveals tradeoffs that inform future radar-VLM pipeline design.

Problem

Research questions and friction points this paper is trying to address.

weather-robust perception

4D radar

scene semantics

vision-language model

sensor degradation

Innovation

Methods, ideas, or system contributions that make the work stand out.

4D radar

vision-language model

weather robustness