🤖 AI Summary
Current vision-language models (VLMs) are constrained by Euclidean space assumptions and isotropic backbone architectures, limiting their capacity to model multi-geometric astronomical phenomena—such as planetary orbits (spherical geometry) and black hole spacetime (hyperbolic geometry). To address this, we propose the first geometry-aware VLM. Our method constructs multi-scale physical graph representations, integrates spherical and hyperbolic space embeddings, introduces a geometric prompting mechanism, and employs a Mixture-of-Experts (MoE)-style geometric adapter—enabling random-walk token generation and anisotropic structural compression across heterogeneous geometric spaces. We further design a geometry-aware cross-modal alignment pretraining objective for unified multimodal modeling. Evaluated on galaxy property estimation (R² = 0.91) and morphological classification (F1-score improvement of 0.17), our model significantly outperforms both domain-specific and general-purpose VLMs. This work establishes a novel paradigm for cosmological-scale astronomical understanding grounded in differential geometry.
📝 Abstract
Modern vision-language models (VLMs) develop patch embedding and convolution backbone within vector space, especially Euclidean ones, at the very founding. When expanding VLMs to a galaxy scale for understanding astronomical phenomena, the integration of spherical space for planetary orbits and hyperbolic spaces for black holes raises two formidable challenges. a) The current pre-training model is confined to Euclidean space rather than a comprehensive geometric embedding. b) The predominant architecture lacks suitable backbones for anisotropic physical geometries. In this paper, we introduced Galaxy-Walker, a geometry-aware VLM, for the universe-level vision understanding tasks. We proposed the geometry prompt that generates geometry tokens by random walks across diverse spaces on a multi-scale physical graph, along with a geometry adapter that compresses and reshapes the space anisotropy in a mixture-of-experts manner. Extensive experiments demonstrate the effectiveness of our approach, with Galaxy-Walker achieving state-of-the-art performance in both galaxy property estimation ($R^2$ scores up to $0.91$) and morphology classification tasks (up to $+0.17$ F1 improvement in challenging features), significantly outperforming both domain-specific models and general-purpose VLMs.