🤖 AI Summary
This work addresses the limitations of existing learning-based MIMO detection methods, which typically employ fixed-depth architectures and struggle to explicitly model the iterative refinement inherent in symbol estimation. The authors reformulate MIMO detection as a denoising problem under noisy observations and propose a soft graph diffusion Transformer framework. This approach leverages an AdaLN conditioning mechanism for stage-aware information fusion and combines flow matching with diffusion modeling to progressively refine estimates from an initial Gaussian distribution toward the observation-conditioned posterior. Furthermore, it directly optimizes bit posterior probabilities via a cross-entropy objective, better aligning with the discrete nature of symbol detection. The method achieves significant bit error rate reductions across diverse MIMO configurations and demonstrates strong generalization across varying channel conditions.
📝 Abstract
Learning-based MIMO detection has shown strong empirical performance, yet existing methods typically rely on fixed-depth architectures without explicitly modeling the progressive refinement of symbol estimates. In this paper, we revisit MIMO detection from a flow matching perspective and propose the Soft Graph Diffusion Transformer (SGDiT), which reformulates detection as a noise-level-conditioned denoising process that progressively transforms a Gaussian initialization toward the posterior conditioned on channel observations. An adaptive layer normalization (AdaLN)-conditioned soft graph transformer is employed to parameterize the denoising dynamics, enabling stage-aware information integration between observation and symbol domains. To better align with the discrete nature of symbol detection, we further adopt a cross-entropy-based training objective that directly models bit-wise posterior probabilities, providing a more suitable inductive bias than conventional regression-based formulations. Experimental results across various MIMO system configurations demonstrate that SGDiT achieves competitive bit error rate (BER) performance compared with representative baselines. Furthermore, the proposed model exhibits good generalization capability across different channel conditions. Overall, the SGDiT framework provides an effective and practical approach for neural MIMO detection.