🤖 AI Summary
Venture capital (VC) startup success prediction constitutes an “out-of-graph” forecasting task, where existing methods struggle to jointly model graph-structured evidence and leverage the interpretable reasoning capabilities of large language models (LLMs).
Method: We propose the first information-gain-driven graph path retrieval mechanism tailored for VC prediction, integrated with a multi-agent heterogeneous evidence fusion gating architecture. This enables compact chain-of-thought reasoning while mitigating path explosion. Our approach combines RAG-enhanced prompting, multi-perspective graph path sampling, learnable gating-based fusion, and synergistic LLM–GNN reasoning.
Contribution/Results: Under strict anti-leakage evaluation, our method achieves a 5.0% absolute improvement in F1 score and a 16.6% gain in Precision@5. It establishes a novel, interpretable, and generalizable paradigm for out-of-graph prediction tasks—including recommendation and risk assessment—by unifying structural evidence modeling with faithful, stepwise LLM reasoning.
📝 Abstract
Most venture capital (VC) investments fail, while a few deliver outsized returns. Accurately predicting startup success requires synthesizing complex relational evidence, including company disclosures, investor track records, and investment network structures, through explicit reasoning to form coherent, interpretable investment theses. Traditional machine learning and graph neural networks both lack this reasoning capability. Large language models (LLMs) offer strong reasoning but face a modality mismatch with graphs. Recent graph-LLM methods target in-graph tasks where answers lie within the graph, whereas VC prediction is off-graph: the target exists outside the network. The core challenge is selecting graph paths that maximize predictor performance on an external objective while enabling step-by-step reasoning. We present MIRAGE-VC, a multi-perspective retrieval-augmented generation framework that addresses two obstacles: path explosion (thousands of candidate paths overwhelm LLM context) and heterogeneous evidence fusion (different startups need different analytical emphasis). Our information-gain-driven path retriever iteratively selects high-value neighbors, distilling investment networks into compact chains for explicit reasoning. A multi-agent architecture integrates three evidence streams via a learnable gating mechanism based on company attributes. Under strict anti-leakage controls, MIRAGE-VC achieves +5.0% F1 and +16.6% PrecisionAt5, and sheds light on other off-graph prediction tasks such as recommendation and risk assessment. Code: https://anonymous.4open.science/r/MIRAGE-VC-323F.