🤖 AI Summary
This work addresses the reliance of existing public transit routing systems on structured maps and complex routing engines by introducing TransitLM, a large-scale dataset comprising over 13 million transit records from four Chinese cities. It presents the first fully data-driven, end-to-end approach to bus route generation that operates without explicit map dependencies. Through continued pretraining of large language models on this dataset, the model implicitly aligns arbitrary GPS coordinates to transit stops, eliminating the need for explicit geospatial infrastructure. Experimental results demonstrate that the proposed method achieves high accuracy in generating structurally valid routes across three complementary tasks, significantly advancing the state of intelligent transit planning under map-free conditions.
📝 Abstract
Public transit route planning traditionally depends on structured map infrastructure and complex routing engines, and no existing dataset supports training models to bypass this dependency. We present TransitLM, a large-scale dataset of over 13 million transit route planning records from four Chinese cities covering 120,845 stations and 13,666 lines, released as a continual pre-training corpus and benchmark data for three evaluation tasks with complementary metrics. Experiments show that an LLM trained on TransitLM produces structurally valid routes at high accuracy and implicitly grounds arbitrary GPS coordinates to appropriate stations without any explicit mapping. These results demonstrate that transit route planning can be learned entirely from data, enabling end-to-end, map-free route generation directly from origin-destination information. The dataset and benchmark are available at https://huggingface.co/datasets/GD-ML/TransitLM, with evaluation code at https://github.com/HotTricker/TransitLM.