SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

📅 2025-10-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Spreadsheets are widely used in business and finance, yet their operations lack systematic documentation, severely hindering reproducibility, collaboration, and knowledge transfer. To address this, we introduce Spreadsheet Operation Documentation (SOD), the first task dedicated to automatically generating natural-language descriptions of spreadsheet operations using large language models (LLMs). We construct the first high-quality benchmark dataset—SOD-Bench—comprising 111 annotated code snippets paired with human-written operational descriptions. We evaluate state-of-the-art LLMs (e.g., GPT-4o, LLaMA-3.3-70B) across multiple metrics (BLEU, ROUGE-L, METEOR, GLEU), demonstrating that LLMs can accurately capture spreadsheet operation semantics. Our results confirm the feasibility of SOD and establish a new paradigm for LLM-driven low-code office automation. This work fills a critical gap in non-code-generation documentation tasks within office automation research and provides both a foundational benchmark and methodological framework for future studies.

Technology Category

Application Category

📝 Abstract
Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR metrics. Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing reproducibility, maintainability, and collaborative workflows in spreadsheets, although there are challenges that need to be addressed.
Problem

Research questions and friction points this paper is trying to address.

Automating documentation of spreadsheet operations using AI
Translating spreadsheet code into natural language explanations
Evaluating LLM performance on spreadsheet operation summarization
Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate natural language from spreadsheet code
Benchmark created with 111 code-summary pairs
Five LLMs evaluated using four metrics
🔎 Similar Papers
No similar papers found.
A
Amila Indika
Department of Information and Computer Sciences, University of Hawaii at Manoa
Igor Molybog
Igor Molybog
Assistant Professor, UH Manoa
Machine LearningOptimization