From Rows to Reasoning: A Retrieval-Augmented Multimodal Framework for Spreadsheet Understanding

📅 2026-01-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge that large language models struggle to effectively reason over enterprise-grade spreadsheets containing thousands of rows, multiple worksheets, and embedded charts. To this end, the authors propose a multimodal retrieval-augmented generation framework that enhances complex spreadsheet understanding through fine-grained row/column/block-level embeddings, a hybrid lexical-dense retrieval strategy, and a Reciprocal Rank Fusion (RRF) mechanism to integrate multimodal information. The study introduces FRTR-Bench, the first large-scale multimodal benchmark for spreadsheet reasoning, on which their approach achieves 74% accuracy with Claude Sonnet 4.5—surpassing the previous state of the art by 50 percentage points—and attains 87% accuracy using GPT-5 on SpreadsheetLLM while reducing token consumption by approximately 50%.

Technology Category

Application Category

📝 Abstract
Large Language Models (LLMs) struggle to reason over large-scale enterprise spreadsheets containing thousands of numeric rows, multiple linked sheets, and embedded visual content such as charts and receipts. Prior state-of-the-art spreadsheet reasoning approaches typically rely on single-sheet compression or full-context encoding, which limits scalability and fails to reflect how real users interact with complex, multimodal workbooks. We introduce FRTR-Bench, the first large-scale benchmark for multimodal spreadsheet reasoning, comprising 30 enterprise-grade Excel workbooks spanning nearly four million cells and more than 50 embedded images. To address these challenges, we present From Rows to Reasoning (FRTR), an advanced, multimodal retrieval-augmented generation framework that decomposes Excel workbooks into granular row, column, and block embeddings, employs hybrid lexical-dense retrieval with Reciprocal Rank Fusion (RRF), and integrates multimodal embeddings to reason over both numerical and visual information. We tested FRTR on six LLMs, achieving 74% answer accuracy on FRTR-Bench with Claude Sonnet 4.5, a substantial improvement over prior state-of-the-art approaches that reached only 24%. On the SpreadsheetLLM benchmark, FRTR achieved 87% accuracy with GPT-5 while reducing token usage by roughly 50% compared to direct serialization methods.
Problem

Research questions and friction points this paper is trying to address.

spreadsheet understanding
multimodal reasoning
large language models
enterprise spreadsheets
visual content
Innovation

Methods, ideas, or system contributions that make the work stand out.

retrieval-augmented generation
multimodal reasoning
spreadsheet understanding
Reciprocal Rank Fusion
granular embedding
🔎 Similar Papers
No similar papers found.
Anmol Gulati
Anmol Gulati
Researcher, Google Deepmind
S
Sahil Sen
Commercial Technology and Innovation Office, PricewaterhouseCoopers, U.S.
W
Waqar Sarguroh
Commercial Technology and Innovation Office, PricewaterhouseCoopers, U.S.
K
Kevin Paul
Commercial Technology and Innovation Office, PricewaterhouseCoopers, U.S.