Mochi: Aligning Pre-training and Inference for Efficient Graph Foundation Models via Meta-Learning

📅 2026-04-23

📈 Citations: 0

✨ Influential: 0

career value

191K/year

🤖 AI Summary

This work addresses the performance limitations of existing graph foundation models, which stem from a misalignment between pretraining objectives and downstream tasks, often necessitating post-processing steps. To overcome this, the paper introduces meta-learning into graph foundation model pretraining for the first time, enabling end-to-end training over few-shot task episodes. This approach directly aligns the pretraining objective with the inference goal, thereby eliminating the need for post-processing. By integrating meta-learning, few-shot learning, and graph neural networks, the proposed method achieves competitive or superior performance compared to state-of-the-art models across 25 real-world graph datasets, while reducing training time by 8 to 27 times.

Technology Category

Application Category

📝 Abstract

We propose Mochi, a Graph Foundation Model that addresses task unification and training efficiency by adopting a meta-learning based training framework. Prior models pre-train with reconstruction-based objectives such as link prediction, and assume that the resulting representations can be aligned with downstream tasks through a separate unification step such as class prototypes. We demonstrate through synthetic and real-world experiments that this procedure, while simple and intuitive, has limitations that directly affect downstream task performance. To address these limitations, Mochi pre-trains on few-shot episodes that mirror the downstream evaluation protocol, aligning the training objective with inference rather than relying on a post-hoc unification step. We show that Mochi, along with its more powerful variant Mochi++, achieves competitive or superior performance compared to existing Graph Foundation Models across 25 real-world graph datasets spanning node classification, link prediction, and graph classification, while requiring 8$\sim$27 times less training time than the strongest baseline.

Problem

Research questions and friction points this paper is trying to address.

Graph Foundation Models

task unification

training efficiency

pre-training

downstream task alignment

Innovation

Methods, ideas, or system contributions that make the work stand out.

meta-learning

graph foundation model

pre-training alignment