The NIAID Discovery Portal: A Unified Search Engine for Infectious and Immune-Mediated Disease Datasets

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Infectious and immune-mediated disease (IID) data are fragmented across disparate sources, lack standardized metadata schemas, and suffer from poor discoverability and reusability. Method: We developed the first unified metadata search platform specifically for the IID domain, harmonizing over 4 million dataset-level metadata records from 400+ specialized and general-purpose databases. Through format normalization, semantic integration, and construction of a domain-specific ontology, the platform enables natural-language search, predefined queries, faceted browsing, and programmatic API access. Contribution/Results: This work represents the first systematic, cross-source metadata aggregation and interoperability framework for IID data, substantially enhancing Findability, Accessibility, Interoperability, and Reusability (FAIRness). The platform is actively supporting NIH/NIAID-funded projects and global researchers in hypothesis-driven analysis, cross-cohort comparison, and secondary analysis of public datasets—thereby increasing the scientific return on investment in biomedical data infrastructure.

Technology Category

Application Category

📝 Abstract
The NIAID Data Ecosystem Discovery Portal (https://data.niaid.nih.gov) provides a unified search interface for over 4 million datasets relevant to infectious and immune-mediated disease (IID) research. Integrating metadata from domain-specific and generalist repositories, the Portal enables researchers to identify and access datasets using user-friendly filters or advanced queries, without requiring technical expertise. The Portal supports discovery of a wide range of resources, including epidemiological, clinical, and multi-omic datasets, and is designed to accommodate exploratory browsing and precise searches. The Portal provides filters, prebuilt queries, and dataset collections to simplify the discovery process for users. The Portal additionally provides documentation and an API for programmatic access to harmonized metadata. By easing access barriers to important biomedical datasets, the NIAID Data Ecosystem Discovery Portal serves as an entry point for researchers working to understand, diagnose, or treat IID. IMPORTANCE Valuable datasets are often overlooked because they are difficult to locate. The NIAID Data Ecosystem Discovery Portal fills this gap by providing a centralized, searchable interface that empowers users with varying levels of technical expertise to find and reuse data. By standardizing key metadata fields and harmonizing heterogeneous formats, the Portal improves data findability, accessibility, and reusability. This resource supports hypothesis generation, comparative analysis, and secondary use of public data by the IID research community, including those funded by NIAID. The Portal supports data sharing by standardizing metadata and linking to source repositories, and maximizes the impact of public investment in research data by supporting scientific advancement via secondary use.
Problem

Research questions and friction points this paper is trying to address.

Centralized search for infectious disease datasets
Simplifying access to biomedical data resources
Harmonizing metadata to improve data reusability
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified search interface for biomedical datasets
Integrates metadata from diverse repositories
Provides user-friendly filters and advanced queries
🔎 Similar Papers
No similar papers found.
G
Ginger Tsueng
The Scripps Research Institute, La Jolla, CA
E
Emily Bullen
The Scripps Research Institute, La Jolla, CA
C
Candice Czech
The Scripps Research Institute, La Jolla, CA
D
Dylan Welzel
The Scripps Research Institute, La Jolla, CA
L
Leandro Collares
The Scripps Research Institute, La Jolla, CA
J
Jason Lin
The Scripps Research Institute, La Jolla, CA
E
Everaldo Rodolpho
The Scripps Research Institute, La Jolla, CA
Z
Zubair Qazi
The Scripps Research Institute, La Jolla, CA
N
Nichollette Acosta
The Scripps Research Institute, La Jolla, CA
L
Lisa M. Mayer
Office of Data Science and Emerging Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD
S
Sudha Venkatachari
National Cancer Institute, Rockville, MD
Z
Zorana Mitrović Vučičević
Velsera, Charlestown, MA
P
Poromendro N. Burman
Velsera, Charlestown, MA
D
Deepti Jain
Velsera, Charlestown, MA
Jack DiGiovanna
Jack DiGiovanna
Velsera
Data ScienceData Analysis EcosystemsNeuroprostheticsReinforcement Learning
M
Maria Giovanni
National Institute of Allergy and Infectious Diseases, Rockville, MD
A
Asiyah Lin
Office of Data Science and Emerging Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD
W
Wilbert Van Panhuis
Office of Data Science and Emerging Technologies, National Institute of Allergy and Infectious Diseases, Rockville, MD
L
Laura D. Hughes
The Scripps Research Institute, La Jolla, CA
A
Andrew I. Su
The Scripps Research Institute, La Jolla, CA
Chunlei Wu
Chunlei Wu
The Scripps Research Institute, La Jolla, CA