HyPAC: Cost-Efficient LLMs-Human Hybrid Annotation with PAC Error Guarantees

📅 2026-01-30

📈 Citations: 0

✨ Influential: 0

career value

167K/year

🤖 AI Summary

This work addresses the challenge of cost-efficient task allocation among multiple annotation sources—such as large language models, reasoning models, and human experts—while strictly controlling the annotation error rate on test instances. The authors propose HyPAC, a novel hybrid annotation framework that, for the first time, provides Probably Approximately Correct (PAC) theoretical guarantees without requiring assumptions about data distribution or pre-trained models. HyPAC dynamically partitions inputs into three regions based on uncertainty quantification, then adaptively routes instances to the optimal annotator via importance sampling and an upper confidence bound algorithm. Decision thresholds are calibrated to jointly optimize annotation accuracy and cost. Experiments on standard benchmarks demonstrate that HyPAC reduces annotation costs by 78.51% while precisely maintaining the target error rate, confirming its effectiveness and robustness.

Technology Category

Application Category

📝 Abstract

Data annotation often involves multiple sources with different cost-quality trade-offs, such as fast large language models (LLMs), slow reasoning models, and human experts. In this work, we study the problem of routing inputs to the most cost-efficient annotation source while controlling the labeling error on test instances. We propose \textbf{HyPAC}, a method that adaptively labels inputs to the most cost-efficient annotation source while providing distribution-free guarantees on annotation error. HyPAC calibrates two decision thresholds using importance sampling and upper confidence bounds, partitioning inputs into three regions based on uncertainty and routing each to the appropriate annotation source. We prove that HyPAC achieves the minimum expected cost with a probably approximately correct (PAC) guarantee on the annotation error, free of data distribution and pre-trained models. Experiments on common benchmarks demonstrate the effectiveness of our method, reducing the annotation cost by 78.51\% while tightly controlling the annotation error.

Problem

Research questions and friction points this paper is trying to address.

cost-efficient annotation

annotation error control

hybrid annotation

PAC guarantee

labeling cost

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid Annotation

Cost-Efficiency

PAC Guarantee