🤖 AI Summary
Public transit fuel efficiency analysis suffers from fragmented multimodal data, high manual interpretation costs, and poor scalability. To address these challenges, this paper proposes a decision-oriented multi-agent analytical framework that integrates multimodal large language models (LLMs), Gaussian mixture model (GMM) clustering, chain-of-thought prompting, and an LLM-as-a-judge evaluation mechanism—enabling end-to-end generation of interpretable narrative reports from raw trip data. Key contributions include: (1) a collaborative multi-agent architecture that decomposes tasks across data parsing, pattern discovery, and narrative synthesis; and (2) an optional human-in-the-loop evaluation module ensuring both factual accuracy and domain-specific adaptability. Evaluated on 4,006 real-world bus trips in North Jutland, Denmark, the framework with GPT-4.1 Mini achieves 97.3% narrative accuracy, significantly improving analytical consistency, automation, and decision-support capability.
📝 Abstract
Enhancing fuel efficiency in public transportation requires the integration of complex multimodal data into interpretable, decision-relevant insights. However, traditional analytics and visualization methods often yield fragmented outputs that demand extensive human interpretation, limiting scalability and consistency. This study presents a multi-agent framework that leverages multimodal large language models (LLMs) to automate data narration and energy insight generation. The framework coordinates three specialized agents, including a data narration agent, an LLM-as-a-judge agent, and an optional human-in-the-loop evaluator, to iteratively transform analytical artifacts into coherent, stakeholder-oriented reports. The system is validated through a real-world case study on public bus transportation in Northern Jutland, Denmark, where fuel efficiency data from 4006 trips are analyzed using Gaussian Mixture Model clustering. Comparative experiments across five state-of-the-art LLMs and three prompting paradigms identify GPT-4.1 mini with Chain-of-Thought prompting as the optimal configuration, achieving 97.3% narrative accuracy while balancing interpretability and computational cost. The findings demonstrate that multi-agent orchestration significantly enhances factual precision, coherence, and scalability in LLM-based reporting. The proposed framework establishes a replicable and domain-adaptive methodology for AI-driven narrative generation and decision support in energy informatics.