Adapting Point Cloud Analysis via Multimodal Bayesian Distribution Learning

📅 2026-03-23

📈 Citations: 0

✨ Influential: 0

career value

177K/year

🤖 AI Summary

This work addresses the significant performance degradation of multimodal 3D vision-language models under distribution shifts, a challenge exacerbated by the instability of existing test-time adaptation methods due to limited feature caching and heuristic fusion strategies. To overcome these limitations, we propose BayesMM, the first framework to integrate Bayesian distribution modeling into multimodal test-time adaptation. BayesMM probabilistically represents textual priors and streaming visual features using Gaussian distributions and dynamically fuses the two modalities through Bayesian model averaging, enabling training-free, continual adaptation. By combining semantic prompt-guided textual priors with online visual parameter updates, our method achieves an average performance gain of over 4% across multiple point cloud benchmarks, substantially improving robustness under domain shift.

Technology Category

Application Category

📝 Abstract

Multimodal 3D vision-language models show strong generalization across diverse 3D tasks, but their performance still degrades notably under domain shifts. This has motivated recent studies on test-time adaptation (TTA), which enables models to adapt online using test-time data. Among existing TTA methods, cache-based mechanisms are widely adopted for leveraging previously observed samples in online prediction refinement. However, they store only limited historical information, leading to progressive information loss as the test stream evolves. In addition, their prediction logits are fused heuristically, making adaptation unstable. To address these limitations, we propose BayesMM, a Multimodal Bayesian Distribution Learning framework for test-time point cloud analysis. BayesMM models textual priors and streaming visual features of each class as Gaussian distributions: textual parameters are derived from semantic prompts, while visual parameters are updated online with arriving samples. The two modalities are fused via Bayesian model averaging, which automatically adjusts their contributions based on posterior evidence, yielding a unified prediction that adapts continually to evolving test-time data without training. Extensive experiments on multiple point cloud benchmarks demonstrate that BayesMM maintains robustness under distributional shifts, yielding over 4% average improvement.

Problem

Research questions and friction points this paper is trying to address.

domain shift

test-time adaptation

point cloud analysis

multimodal learning

information loss

Innovation

Methods, ideas, or system contributions that make the work stand out.

Bayesian model averaging

test-time adaptation

multimodal learning