Automated Extraction of Pharmacokinetic Parameters from Structured XML Scientific Articles: Enhancing Data Accessibility at Scale

📅 2026-04-22
📈 Citations: 0
Influential: 0
📄 PDF

career value

191K/year
🤖 AI Summary
This study addresses the challenge of fragmented pharmacokinetic (PK) data scattered across structurally heterogeneous tables in scientific literature, which hinders efficient and accurate manual extraction. To overcome this, the authors propose a human-centric table understanding approach that integrates natural language processing, computer vision, and structured parsing techniques to develop a specialized AI model. This model automatically interprets complex table layouts in XML-formatted publications, accurately aligns semantic meanings of row and column headers, and extracts PK parameters with high precision. The method significantly enhances both extraction accuracy and scalability, enabling large-scale, automated acquisition of PK data and laying the foundation for a dynamic, continuously updated PK knowledge base.

Technology Category

Application Category

📝 Abstract
In the field of pharmacology, there is a notable absence of centralized, comprehensive, and up-to-date repositories of PK data. This poses a significant challenge for R&D as it can be a time-consuming and challenging task to collect all the required quantitative PK parameters from diverse scientific publications. This quantitative PK information is predominantly organized in tabular format, mostly available as XML, HTML, or PDF files within various online repositories and scientific publications, including supplementary materials. This makes tables one of the crucial components and information elements of scientific or regulatory documents as they are commonly utilized to present quantitative information. Extracting data from tables is typically a labor-intensive process, and alternative automated machine learning models may struggle to accurately detect and extract the relevant data due to the complex nature and diverse layouts of tabular data. The difficulty of information extraction and reading order detection is largely dependent on the structural complexity of the tables. Efforts to understand tables should prioritize capturing the content of table cells in a manner that aligns with how a human reader naturally comprehends the information. FARAD has been manually extracting tabular data and other information from literature and regulatory agencies for over 40 years. However, there is now an urgent need to automate this process due to the large volume of publications released daily. The accuracy of this task has become increasingly challenging, as manual extraction is tedious and prone to errors, especially given the staffing shortages we are currently facing. This necessitates the development of AI algorithms for table detection and extraction that are able to precisely handle cells organized according to the table structure, as indicated by column and/or row header information.
Problem

Research questions and friction points this paper is trying to address.

pharmacokinetic parameters
table extraction
structured XML
data accessibility
scientific publications
Innovation

Methods, ideas, or system contributions that make the work stand out.

automated table extraction
pharmacokinetic parameters
structured XML
AI for scientific tables
data accessibility
R
Remya Ampadi Ramachandran
1DATA Consortium, www.1DATA.life, Kansas State University Olathe, Olathe, KS, USA; Food Animal Residue Avoidance and Databank Program (FARAD), Kansas State University Olathe, Olathe, KS, USA; Department of Mathematics, Kansas State University, Manhattan, KS, United States
L
Lisa A. Tell
FARAD, Department of Medicine and Epidemiology, School of Veterinary Medicine, University of California-Davis, Davis, CA
Sidharth Rai
Sidharth Rai
Research Engineer
Computer VisionMachine Learning
N
Nuwan Millagaha Gedara
1DATA Consortium, www.1DATA.life, Kansas State University Olathe, Olathe, KS, USA; Food Animal Residue Avoidance and Databank Program (FARAD), Kansas State University Olathe, Olathe, KS, USA
H
Hossein Sholehrasa
1DATA Consortium, www.1DATA.life, Kansas State University Olathe, Olathe, KS, USA; Food Animal Residue Avoidance and Databank Program (FARAD), Kansas State University Olathe, Olathe, KS, USA; Department of Computer Science, Kansas State University, Manhattan, KS, United States
J
Jim E. Riviere
1DATA Consortium, www.1DATA.life, Kansas State University Olathe, Olathe, KS, USA; Food Animal Residue Avoidance and Databank Program (FARAD), Kansas State University Olathe, Olathe, KS, USA
Majid Jaberi-Douraki
Majid Jaberi-Douraki
Kansas State University
Mathematical BiologyBig DataData ScienceOne Health1DATA