Scholar

Lixiang Ru

Google Scholar ID: y7wegKQAAAAJ

Ant Group

computer visionMLLMmulti-modal learningremote sensing

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,361

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailrulixiang@outlook.com GitHubOpen ↗

Publications

9 items

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

2025

Cited

ARGenSeg: Image Segmentation with Autoregressive Image Generation Model

2025

Cited

CasP: Improving Semi-Dense Feature Matching Pipeline Leveraging Cascaded Correspondence Priors for Guidance

2025

Cited

SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing

2025

Cited

M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning

2025

Cited

Ming-Omni: A Unified Multimodal Model for Perception and Generation

2025

Cited

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

2025

Cited

Plug-and-Play DISep: Separating Dense Instances for Scene-to-Pixel Weakly-Supervised Change Detection in High-Resolution Remote Sensing Images

2025

Cited

Resume (English only)

Academic Achievements

[{'Title': 'ARGenSeg: Image Segmentation with Autoregressive Image Generation Model', 'Conference/Journal': 'NeurIPS, 2025'}, {'Title': 'M2-Reasoning: Empowering MLLMs with Unified General and Spatial Reasoning', 'Conference/Journal': 'arXiv, 2025'}, {'Title': 'Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction', 'Conference/Journal': 'arXiv, 2025'}, {'Title': 'A Semantic-Enhanced Multi-Modal Remote Sensing Foundation Model for Earth Observation', 'Conference/Journal': 'Nature Machine Intelligence (NMI), 2025'}, {'Title': 'SkySense V2: A Unified Foundation Model for Multi-modal Remote Sensing', 'Conference/Journal': 'ICCV, 2025'}, {'Title': 'SkySense: A Multi-Modal Remote Sensing Foundation Model Towards Universal Interpretation for Earth Observation Imagery', 'Conference/Journal': 'CVPR, 2024'}, {'Title': 'Parameter-Efficient Complementary Expert Learning for Long-Tailed Visual Recognition', 'Conference/Journal': 'ACM Multimedia 2024 (ACM MM), 2024'}, {'Title': 'Token Contrast for Weakly-Supervised Semantic Segmentation', 'Conference/Journal': 'CVPR, 2023'}, {'Title': 'Learning Affinity from Attention: End-to-End Weakly-Supervised Semantic Segmentation with Transformers', 'Conference/Journal': 'CVPR, 2022'}, {'Title': 'Weakly-Supervised Semantic Segmentation with Visual Words Learning and Hybrid Pooling', 'Conference/Journal': 'IJCV, 2022'}, {'Title': 'Learning Visual Words for Weakly-Supervised Semantic Segmentation', 'Conference/Journal': 'IJCAI, 2021'}, {'Title': 'Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion', 'Conference/Journal': 'Incomplete information provided'}]

Background

Currently a Computer Vision Researcher focusing on Multi-modal Model and Visual Understanding at Ant Group. Research interests include Multi-Modal Learning and Reasoning, Visual Understanding, and Remote Sensing.

Co-authors

12 total