From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

📅 2026-01-13

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the challenge that large language models struggle to effectively reason over enterprise-grade spreadsheets containing thousands of rows, multiple worksheets, and embedded charts. To this end, the authors propose a multimodal retrieval-augmented generation framework that enhances complex spreadsheet understanding through fine-grained row/column/block-level embeddings, a hybrid lexical-dense retrieval strategy, and a Reciprocal Rank Fusion (RRF) mechanism to integrate multimodal information. The study introduces FRTR-Bench, the first large-scale multimodal benchmark for spreadsheet reasoning, on which their approach achieves 74% accuracy with Claude Sonnet 4.5—surpassing the previous state of the art by 50 percentage points—and attains 87% accuracy using GPT-5 on SpreadsheetLLM while reducing token consumption by approximately 50%.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts. Prior state-of-the-art spreadsheet reasoning approaches typically rely on single-sheet compression or full-context encoding, which limits scalability and fails to reflect how real users interact with complex, multimodal workbooks. We introduce FRTR-Bench, the first large-scale benchmark for multimodal spreadsheet reasoning, comprising 30 enterprise-grade Excel workbooks spanning nearly four million cells and more than 50 embedded images. To address these challenges, we present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings, employs hybrid lexical-dense retrieval with Reciprocal Rank Fusion (RRF), and integrates multimodal embeddings to reason over both numerical and visual information. We tested FRTR on six LLMs, achieving 74% answer accuracy on FRTR-Bench with Claude Sonnet 4.5, a substantial improvement over prior state-of-the-art approaches that reached only 24%. On the SpreadsheetLLM benchmark, FRTR achieved 87% accuracy with GPT-5 while reducing token usage by roughly 50% compared to direct serialization methods.

Problem

Research questions and friction points this paper is trying to address.

spreadsheet understanding

multimodal reasoning

large language models

enterprise spreadsheets

visual content

Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation

multimodal reasoning

spreadsheet understanding

Reciprocal Rank Fusion

granular embedding

🔎 Similar Papers

No similar papers found.

Authors to Follow