E2E Learning Massive MIMO for Multimodal Semantic Non-Orthogonal Transmission and Fusion

📅 2025-09-09

🏛️ arXiv.org

📈 Citations: 0

✨ Influential: 0

career value

206K/year

🤖 AI Summary

To address the challenges of high-dimensional channel state information (CSI) acquisition, excessive feedback overhead, and limited precoding performance in massive MIMO systems, this paper proposes an end-to-end uplink–downlink CSI fusion precoding framework. The method introduces a multimodal semantic fusion architecture that jointly models uplink sounding reference signals (SRS) and downlink CSI-reference signals (CSI-RS); designs a MAXIM-based projection network to drive cross-domain CSI-RS optimization; integrates a quantized feedback encoder with dual-path SRS/feedback-driven precoding branches; and proposes a learnable fusion network along with a three-stage spectral efficiency-oriented training strategy. Simulation results demonstrate that, under constrained feedback budgets, the proposed scheme significantly improves spectral efficiency by effectively synergizing uplink channel priors and user equipment feedback, consistently outperforming conventional precoding approaches across all evaluated metrics.

Technology Category

Application Category

📝 Abstract

Massive multiple-input multiple-output (MIMO) promises high spectral efficiency but also leads to high-dimensional downlink channel state information (CSI), which complicates real-time channel acquisition and precoding. To address this, we propose an end-to-end (E2E) uplink-downlink CSI fusion precoding network that jointly models downlink CSI reference signal (CSI-RS) design, CSI feedback, and base-station (BS) precoding within a single E2E neural architecture. Concretely, a projection network built on the MAXIM architecture takes uplink sounding reference signals (SRS) as input and outputs frequency-, beam-, and port-domain projection matrices for designing downlink CSI-RS. User equipment (UE) then compresses/quantizes the resulting CSI-RS observations and feeds back a compact representation. At the base station (BS), two complementary branches produce candidate precoders: one is a feedback-only precoding network driven by quantized downlink observations, and the other is an SRS-only precoding network driven by uplink SRS. These candidate precoders are subsequently combined by a fusion precoding network to yield the final transmit precoder. All the modules are trained with a spectral-efficiency-oriented loss under a three-stage schedule. Simulation results show that the proposed approach effectively harnesses both SRS-derived information and UE feedback, achieving markedly better performance than conventional baselines.

Problem

Research questions and friction points this paper is trying to address.

Optimizes multimodal semantic non-orthogonal transmission and fusion in massive MIMO systems.

End-to-end trains cross-modal networks for joint physical and application-layer semantic processing.

Enhances spectral efficiency and semantic task performance over traditional separated designs.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Transformer-based cross-modal semantic-aware network for joint optimization

End-to-end training integrates physical and application layer tasks

Five subnetworks enable multimodal semantic fusion and transmission

🔎 Similar Papers

No similar papers found.