🤖 AI Summary
This work addresses the limitations of current large language models in representing biomolecular structures, which often rely on modality-specific sequential encodings or fixed-length connectors that struggle to capture geometric information effectively and suffer from over-compression and imbalanced token allocation as structural complexity increases. To overcome these challenges, the authors propose a unified all-atom framework that adaptively perceives structural complexity by constructing variable-sized structural blocks on molecular graphs via an instruction-conditioned gating strategy. Furthermore, a cross-attention mechanism is introduced to inject geometry-aware tokens into the language model, thereby enhancing structural grounding. This approach transcends the constraints of conventional fixed-token representations, achieving significantly improved generalization in heterogeneous structural reasoning across multiple all-atom benchmarks and effectively mitigating structural hallucination.
📝 Abstract
Large language models (LLMs) increasingly support reasoning over biomolecular structures, but most existing approaches remain modality-specific and rely on either sequence-style encodings or fixed-length connector tokens for structural inputs. These designs can under-expose explicit geometric cues and impose rigid fusion bottlenecks, leading to over-compression and poor token allocation as structural complexity grows. We present a unified all-atom framework that grounds language reasoning in geometric information while adaptively scaling structural tokens. The method first constructs variable-size structural patches on molecular graphs using an instruction-conditioned gating policy, enabling complexity-aware allocation of query tokens. It then refines the resulting patch tokens via cross-attention with modality embeddings and injects geometry-informed tokens into the language model to improve structure grounding and reduce structural hallucinations. Across diverse all-atom benchmarks, the proposed approach yields consistent gains in heterogeneous structure-grounded reasoning. An anonymized implementation is provided in the supplementary material.