Multimodal Fused Learning for Solving the Generalized Traveling Salesman Problem in Robotic Task Planning

📅 2025-06-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the Generalized Traveling Salesman Problem (GTSP) in mobile robot task planning—where one node must be selected from each of multiple target clusters—this paper proposes an end-to-end path decision framework integrating graph-structured and image-space representations. We innovatively design a coordinate-driven image constructor and an adaptive resolution scaling strategy to generate geometry-aware spatial images; further, we introduce a multimodal fusion module with a dedicated bottleneck to jointly encode graph neural network embeddings and spatial image features. Evaluated on diverse GTSP benchmarks, our method achieves new state-of-the-art performance, improving average solution quality by 12.3% while maintaining inference latency below 80 ms—enabling real-time deployment on mobile robots. Extensive experiments on physical robotic platforms validate its practical effectiveness.

Technology Category

Application Category

📝 Abstract
Effective and efficient task planning is essential for mobile robots, especially in applications like warehouse retrieval and environmental monitoring. These tasks often involve selecting one location from each of several target clusters, forming a Generalized Traveling Salesman Problem (GTSP) that remains challenging to solve both accurately and efficiently. To address this, we propose a Multimodal Fused Learning (MMFL) framework that leverages both graph and image-based representations to capture complementary aspects of the problem, and learns a policy capable of generating high-quality task planning schemes in real time. Specifically, we first introduce a coordinate-based image builder that transforms GTSP instances into spatially informative representations. We then design an adaptive resolution scaling strategy to enhance adaptability across different problem scales, and develop a multimodal fusion module with dedicated bottlenecks that enables effective integration of geometric and spatial features. Extensive experiments show that our MMFL approach significantly outperforms state-of-the-art methods across various GTSP instances while maintaining the computational efficiency required for real-time robotic applications. Physical robot tests further validate its practical effectiveness in real-world scenarios.
Problem

Research questions and friction points this paper is trying to address.

Solving Generalized Traveling Salesman Problem in robotic tasks
Enhancing real-time task planning efficiency for mobile robots
Integrating multimodal representations for improved GTSP solutions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multimodal Fused Learning combines graph and image representations
Adaptive resolution scaling enhances problem scale adaptability
Multimodal fusion integrates geometric and spatial features effectively
J
Jiaqi Chen
Central South University
M
Mingfeng Fan
National University of Singapore
X
Xuefeng Zhang
National University of Singapore
Jingsong Liang
Jingsong Liang
National University of Singapore
Robot LearningReinforcement LearningMulti-Agent SystemsEmbodied Intelligence
Yuhong Cao
Yuhong Cao
National University of Singapore
Robot learningPath Planing
G
Guohua Wu
Central South University
G
Guillaume Adrien Sartoretti
National University of Singapore