Less is More in Semantic Space: Intrinsic Decoupling via Clifford-M for Fundus Image Classification

📅 2026-03-21

📈 Citations: 0

✨ Influential: 0

career value

220K/year

🤖 AI Summary

This work addresses the challenge of multi-label fundus image diagnosis, which requires simultaneous modeling of fine-grained lesions and large-scale retinal structures. Conventional multi-scale approaches rely on explicit frequency-domain decomposition, suffering from limited performance gains and computational inefficiency. To overcome these limitations, the authors propose Clifford-M, a lightweight backbone network that eschews handcrafted frequency engineering and instead leverages Clifford algebra-based rolling convolutions to jointly model alignment and structural variations with linear complexity. Embedded within a compact dual-resolution architecture, this design enables intrinsically decoupled cross-scale feature interaction. With only 0.85 million parameters, Clifford-M achieves an average AUC-ROC of 0.8142 and macro-F1 of 0.5481 on ODIR-5K, and further attains macro AUC of 0.7425 and micro AUC of 0.7610 on RFMiD without fine-tuning—significantly outperforming larger CNN baselines.

Technology Category

Application Category

📝 Abstract

Multi-label fundus diagnosis requires features that capture both fine-grained lesions and large-scale retinal structure. Many multi-scale medical vision models address this challenge through explicit frequency decomposition, but our ablation studies show that such heuristics provide limited benefit in this setting: replacing the proposed simple dual-resolution stem with Octave Convolution increased parameters by 35% and computation by a 2.23-fold increase in computation; without improving mean accuracy, while a fixed wavelet-based variant performed substantially worse. Motivated by these findings, we propose Clifford-M, a lightweight backbone that replaces both feed-forward expansion and frequency-splitting modules with sparse geometric interaction. The model is built on a Clifford-style rolling product that jointly captures alignment and structural variation with linear complexity, enabling efficient cross-scale fusion and self-refinement in a compact dual-resolution architecture. Without pre-training, Clifford-M achieves a mean AUC-ROC of 0.8142 and a mean macro-F1 (optimal threshold) of 0.5481 on ODIR-5K using only 0.85M parameters, outperforming substantially larger mid-scale CNN baselines under the same training protocol. When evaluated on RFMiD without fine-tuning, it attains 0.7425 +/- 0.0198 macro AUC and 0.7610 +/- 0.0344 micro AUC, indicating reasonable robustness to cross-dataset shift. These results suggest that competitive and efficient fundus diagnosis can be achieved without explicit frequency engineering, provided that the core feature interaction is designed to capture multi-scale structure directly.

Problem

Research questions and friction points this paper is trying to address.

fundus image classification

multi-label diagnosis

multi-scale representation

frequency decomposition

medical vision

Innovation

Methods, ideas, or system contributions that make the work stand out.

Clifford-M

multi-scale representation

geometric interaction