DRACO: a Cross-Domain Benchmark for Deep Research Accuracy, Completeness, and Objectivity

📅 2026-02-12

📈 Citations: 0

✨ Influential: 0

career value

242K/year

🤖 AI Summary

Existing deep research systems lack cross-domain, multidimensional evaluation benchmarks, particularly exhibiting gaps in critical dimensions such as objectivity and citation quality. To address this limitation, this work introduces and open-sources DRACO, a benchmark comprising complex research tasks spanning ten domains and drawing from information sources across forty countries. These tasks are derived from real user queries, anonymized and enhanced to preserve authenticity while ensuring privacy. DRACO establishes the first multidimensional evaluation framework tailored to authentic deep research scenarios, incorporating human-rated assessments along four key axes: accuracy, completeness, objectivity, and citation quality. This benchmark provides a standardized, reproducible tool for evaluating model capabilities in handling complex research-oriented tasks.

Technology Category

Application Category

📝 Abstract

We present DRACO (Deep Research Accuracy, Completeness, and Objectivity), a benchmark of complex deep research tasks. These tasks, which span 10 domains and draw on information sources from 40 countries, originate from anonymized real-world usage patterns within a large-scale deep research system. Tasks are sampled from a de-identified dataset of Perplexity Deep Research requests, then filtered and augmented to ensure that the tasks are anonymized, open-ended and complex, objectively evaluable, and representative of the broad scope of real-world deep research use cases. Outputs are graded against task-specific rubrics along four dimensions: factual accuracy (accuracy), breadth and depth of analysis (including completeness), presentation quality (including objectivity), and citation quality. DRACO is publicly available at https://hf.co/datasets/perplexity-ai/draco.

Problem

Research questions and friction points this paper is trying to address.

deep research

cross-domain benchmark

factual accuracy

completeness

objectivity

Innovation

Methods, ideas, or system contributions that make the work stand out.

cross-domain benchmark

deep research evaluation

factual accuracy