MedVerse: Efficient and Reliable Medical Reasoning via DAG-Structured Parallel Execution

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the inefficiency of large language models in clinical reasoning tasks—such as differential diagnosis—that inherently require parallel processing but are constrained by sequential autoregressive decoding. To overcome this limitation, the authors propose modeling complex medical reasoning as a directed acyclic graph (DAG) grounded in Petri net theory and introduce a fully parallelized inference framework. This framework incorporates topological-aware attention, adaptive positional encoding, and a zero-overhead parallel decoding engine. Experimental results demonstrate that the approach improves performance by 8.9% on general-purpose large language models, achieving accuracy comparable to specialized medical LLMs while reducing inference latency by 1.3× and increasing generation throughput by 1.7×.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) have demonstrated strong performance and rapid progress in a wide range of medical reasoning tasks. However, their sequential autoregressive decoding forces inherently parallel clinical reasoning, such as differential diagnosis, into a single linear reasoning path, limiting both efficiency and reliability for complex medical problems. To address this, we propose MedVerse, a reasoning framework for complex medical inference that reformulates medical reasoning as a parallelizable directed acyclic graph (DAG) process based on Petri net theory. The framework adopts a full-stack design across data, model architecture, and system execution. For data creation, we introduce the MedVerse Curator, an automated pipeline that synthesizes knowledge-grounded medical reasoning paths and transforms them into Petri net-structured representations. At the architectural level, we propose a topology-aware attention mechanism with adaptive position indices that supports parallel reasoning while preserving logical consistency. Systematically, we develop a customized inference engine that supports parallel execution without additional overhead. Empirical evaluations show that MedVerse improves strong general-purpose LLMs by up to 8.9%. Compared to specialized medical LLMs, MedVerse achieves comparable performance while delivering a 1.3x reduction in inference latency and a 1.7x increase in generation throughput, enabled by its parallel decoding capability. Code is available at https://github.com/aiming-lab/MedVerse.
Problem

Research questions and friction points this paper is trying to address.

medical reasoning
parallel execution
large language models
differential diagnosis
autoregressive decoding
Innovation

Methods, ideas, or system contributions that make the work stand out.

DAG-structured reasoning
parallel medical inference
Petri net
topology-aware attention
efficient LLM decoding
🔎 Similar Papers
No similar papers found.