SODBench: A Large Language Model Approach to Documenting Spreadsheet Operations

📅 2025-10-21

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

Spreadsheets are widely used in business and finance, yet their operations lack systematic documentation, severely hindering reproducibility, collaboration, and knowledge transfer. To address this, we introduce Spreadsheet Operation Documentation (SOD), the first task dedicated to automatically generating natural-language descriptions of spreadsheet operations using large language models (LLMs). We construct the first high-quality benchmark dataset—SOD-Bench—comprising 111 annotated code snippets paired with human-written operational descriptions. We evaluate state-of-the-art LLMs (e.g., GPT-4o, LLaMA-3.3-70B) across multiple metrics (BLEU, ROUGE-L, METEOR, GLEU), demonstrating that LLMs can accurately capture spreadsheet operation semantics. Our results confirm the feasibility of SOD and establish a new paradigm for LLM-driven low-code office automation. This work fills a critical gap in non-code-generation documentation tasks within office automation research and provides both a foundational benchmark and methodological framework for future studies.

Technology Category

Application Category

📝 Abstract

Numerous knowledge workers utilize spreadsheets in business, accounting, and finance. However, a lack of systematic documentation methods for spreadsheets hinders automation, collaboration, and knowledge transfer, which risks the loss of crucial institutional knowledge. This paper introduces Spreadsheet Operations Documentation (SOD), an AI task that involves generating human-readable explanations from spreadsheet operations. Many previous studies have utilized Large Language Models (LLMs) for generating spreadsheet manipulation code; however, translating that code into natural language for SOD is a less-explored area. To address this, we present a benchmark of 111 spreadsheet manipulation code snippets, each paired with a corresponding natural language summary. We evaluate five LLMs, GPT-4o, GPT-4o-mini, LLaMA-3.3-70B, Mixtral-8x7B, and Gemma2-9B, using BLEU, GLEU, ROUGE-L, and METEOR metrics. Our findings suggest that LLMs can generate accurate spreadsheet documentation, making SOD a feasible prerequisite step toward enhancing reproducibility, maintainability, and collaborative workflows in spreadsheets, although there are challenges that need to be addressed.

Problem

Research questions and friction points this paper is trying to address.

Automating documentation of spreadsheet operations using AI

Translating spreadsheet code into natural language explanations

Evaluating LLM performance on spreadsheet operation summarization

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs generate natural language from spreadsheet code

Benchmark created with 111 code-summary pairs

Five LLMs evaluated using four metrics

🔎 Similar Papers

SheetAgent: Towards A Generalist Agent for Spreadsheet Reasoning and Manipulation via Large Language Models