🤖 AI Summary
This work addresses the challenge of few-shot cross-modal adaptation, where Euclidean flow matching suffers from entangled feature trajectories due to its flat geometry, hindering effective alignment between visual and semantic distributions. To overcome this limitation, the paper introduces hyperbolic geometry for the first time in this context and proposes a centripetal hyperbolic alignment mechanism. By constructing a text-anchored hierarchical structure on the Lorentz manifold and designing class-specific geodesic corridors to constrain trajectory evolution, the method enables ordered and disentangled cross-modal transport. An adaptive diameter stopping strategy is further introduced to prevent over-transportation. Evaluated on 11 benchmarks, the approach achieves new state-of-the-art results, significantly outperforming existing Euclidean flow matching methods and demonstrating the superiority of hyperbolic space for few-shot cross-modal alignment.
📝 Abstract
Recent advances in cross-modal few-shot adaptation treat visual-semantic alignment as a continuous feature transport problem via Flow Matching (FM). However, we argue that Euclidean-based FM overlooks fundamental limitations of flat geometry, where polynomial volume growth fails to accommodate diverse feature distributions, leading to severe path entanglement. To this end, we propose path-decoupled Hyperbolic Flow Matching (HFM), leveraging the Lorentz manifold's exponential expansion for trajectory decoupling. HFM structures the transport via two key designs: 1) Centripetal hyperbolic alignment: It constructs a centripetal hierarchy by anchoring textual roots, which pushes visual leaves to the boundary to initialize orderly flows. 2) Path-decoupled objective: It acts as a ``semantic guardrail''rigidly confining trajectories within isolated class-specific geodesic corridors via step-wise supervision. Furthermore, we devise an adaptive diameter-based stopping to prevent over-transportation into the crowded origin based on the intrinsic semantic scale. Extensive ablations on 11 benchmarks have shown that HFM establishes a new state-of-the-art, consistently outperforming its Euclidean counterparts. Our codes and models will be released.