Docling: An Efficient Open-Source Toolkit for AI-driven Document Conversion

📅 2025-01-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the challenges of preserving layout and tabular structure during multi-format document (PDF/Word) parsing—and poor downstream adaptability—this paper introduces DocParser, a lightweight, open-source document intelligence parsing toolkit. Methodologically, it proposes a novel modular, low-overhead architecture integrating custom layout analysis (enhanced from DocLayNet) and table recognition (optimized TableFormer), enabling zero-dependency deployment. It provides Python API and CLI interfaces with native compatibility for RAG and AI frameworks such as LangChain and LlamaIndex. Key contributions include: (1) high-fidelity, richly structured output encompassing text, spatial layout, and relational table semantics; and (2) significantly reduced computational resource consumption. Within one month of its GitHub release, DocParser garnered 10,000 stars and ranked #1 globally on the November 2024 trending repositories list, and has since been widely adopted across major open-source AI ecosystems.

Technology Category

Application Category

📝 Abstract
We introduce Docling, an easy-to-use, self-contained, MIT-licensed, open-source toolkit for document conversion, that can parse several types of popular document formats into a unified, richly structured representation. It is powered by state-of-the-art specialized AI models for layout analysis (DocLayNet) and table structure recognition (TableFormer), and runs efficiently on commodity hardware in a small resource budget. Docling is released as a Python package and can be used as a Python API or as a CLI tool. Docling's modular architecture and efficient document representation make it easy to implement extensions, new features, models, and customizations. Docling has been already integrated in other popular open-source frameworks (e.g., LangChain, LlamaIndex, spaCy), making it a natural fit for the processing of documents and the development of high-end applications. The open-source community has fully engaged in using, promoting, and developing for Docling, which gathered 10k stars on GitHub in less than a month and was reported as the No. 1 trending repository in GitHub worldwide in November 2024.
Problem

Research questions and friction points this paper is trying to address.

Document Formatting
Layout Preservation
Table Structure Conversion
Innovation

Methods, ideas, or system contributions that make the work stand out.

Artificial Intelligence
Document Processing
Open-source Community
🔎 Similar Papers
No similar papers found.
Nikolaos Livathinos
Nikolaos Livathinos
IBM Research
Computer VisionAISoftware Architecture
Christoph Auer
Christoph Auer
IBM Research
Maksym Lysak
Maksym Lysak
IBM
artificial intelligencecomputer vision3d graphicsartshistory
A
Ahmed Nassar
IBM Research, Rüschlikon, Switzerland
Michele Dolfi
Michele Dolfi
IBM Research
Knowledge ingestionCloud computingComputational physicsTensor networksHigh performance computing
P
Panagiotis Vagenas
IBM Research, Rüschlikon, Switzerland
C
Cesar Berrospi Ramis
IBM Research, Rüschlikon, Switzerland
M
Matteo Omenetti
IBM Research, Rüschlikon, Switzerland
Kasper Dinkla
Kasper Dinkla
IBM Research
Y
Yusik Kim
IBM Research, Rüschlikon, Switzerland
S
Shubham Gupta
IBM Research, Rüschlikon, Switzerland
R
Rafael Teixeira de Lima
IBM Research, Rüschlikon, Switzerland
V
Valery Weber
IBM Research, Rüschlikon, Switzerland
L
Lucas Morin
IBM Research, Rüschlikon, Switzerland
I
Ingmar Meijer
IBM Research, Rüschlikon, Switzerland
V
Viktor Kuropiatnyk
IBM Research, Rüschlikon, Switzerland
P
Peter W. J. Staar
IBM Research, Rüschlikon, Switzerland