From Chaos to Automation: Enabling the Use of Unstructured Data for Robotic Process Automation

📅 2025-07-15

📈 Citations: 0

✨ Influential: 0

career value

219K/year

🤖 AI Summary

To address the limitation of robotic process automation (RPA) in handling unstructured data—such as emails and scanned documents—this paper introduces UNDRESS, the first end-to-end framework for unstructured document information extraction and retrieval that integrates fuzzy regular expressions, lightweight NLP modules, and large language models (LLMs). UNDRESS overcomes RPA’s traditional reliance on structured inputs by employing multi-granularity text parsing, semantics-enhanced pattern matching, and context-aware information retrieval to achieve high-accuracy, robust parsing of complex document formats. Experiments on real-world enterprise document corpora demonstrate that UNDRESS improves F1 scores by 23.6% on key-field extraction and cross-document question answering, while reducing inference latency by 41%. These advances significantly broaden RPA’s applicability in unstructured-data-intensive domains—including finance and legal operations—and confirm the framework’s strong scalability and practical deployability.

Technology Category

Application Category

📝 Abstract

The growing volume of unstructured data within organizations poses significant challenges for data analysis and process automation. Unstructured data, which lacks a predefined format, encompasses various forms such as emails, reports, and scans. It is estimated to constitute approximately 80% of enterprise data. Despite the valuable insights it can offer, extracting meaningful information from unstructured data is more complex compared to structured data. Robotic Process Automation (RPA) has gained popularity for automating repetitive tasks, improving efficiency, and reducing errors. However, RPA is traditionally reliant on structured data, limiting its application to processes involving unstructured documents. This study addresses this limitation by developing the UNstructured Document REtrieval SyStem (UNDRESS), a system that uses fuzzy regular expressions, techniques for natural language processing, and large language models to enable RPA platforms to effectively retrieve information from unstructured documents. The research involved the design and development of a prototype system, and its subsequent evaluation based on text extraction and information retrieval performance. The results demonstrate the effectiveness of UNDRESS in enhancing RPA capabilities for unstructured data, providing a significant advancement in the field. The findings suggest that this system could facilitate broader RPA adoption across processes traditionally hindered by unstructured data, thereby improving overall business process efficiency.

Problem

Research questions and friction points this paper is trying to address.

Enabling RPA to handle unstructured data effectively

Extracting insights from emails, reports, and scans automatically

Overcoming RPA's reliance on structured data formats

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses fuzzy regular expressions for data retrieval

Integrates NLP techniques for unstructured documents

Leverages large language models for RPA enhancement

🔎 Similar Papers

Unveiling Latent Topics in Robotic Process Automation - an Approach based on Latent Dirichlet Allocation Smart Review