🤖 AI Summary
Scientific papers feature tables and figures that are structurally complex, multimodal, and highly context-dependent, posing significant challenges for accurate parsing by current AI systems. To address this, this work proposes Anagent, a multi-agent framework comprising four collaborative agents—planning, expert retrieval, solving, and critique—to enable deep understanding and reasoning over scientific visual content. Key contributions include the first multi-agent collaboration mechanism tailored for scientific chart analysis, the construction of AnaBench—a seven-dimensional complexity benchmark with 63,178 samples—and a five-dimensional quality-assessment-driven iterative refinement pipeline integrated with modular training strategies (supervised fine-tuning and domain-specific reinforcement learning). Evaluated across 170 subfields, the approach achieves up to a 13.43% improvement in zero-shot performance and 42.12% after fine-tuning, underscoring the critical role of task-oriented reasoning and context awareness.
📝 Abstract
In scientific research, analysis requires accurately interpreting complex multimodal knowledge, integrating evidence from different sources, and drawing inferences grounded in domain-specific knowledge. However, current artificial intelligence (AI) systems struggle to consistently demonstrate such capabilities. The complexity and variability of scientific tables and figures, combined with heterogeneous structures and long-context requirements, pose fundamental obstacles to scientific table \&figure analysis. To quantify these challenges, we introduce AnaBench, a large-scale benchmark featuring $63,178$ instances from nine scientific domains, systematically categorized along seven complexity dimensions. To tackle these challenges, we propose Anagent, a multi-agent framework for enhanced scientific table \&figure analysis through four specialized agents: Planner decomposes tasks into actionable subtasks, Expert retrieves task-specific information through targeted tool execution, Solver synthesizes information to generate coherent analysis, and Critic performs iterative refinement through five-dimensional quality assessment. We further develop modular training strategies that leverage supervised finetuning and specialized reinforcement learning to optimize individual capabilities while maintaining effective collaboration. Comprehensive evaluation across 9 broad domains with 170 subdomains demonstrates that Anagent achieves substantial improvements, up to $\uparrow 13.43\%$ in training-free settings and $\uparrow 42.12\%$ with finetuning, while revealing that task-oriented reasoning and context-aware problem-solving are essential for high-quality scientific table \&figure analysis. Our project page: https://xhguo7.github.io/Anagent/.