Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification

📅 2026-04-14
📈 Citations: 0
Influential: 0
📄 PDF

career value

186K/year
🤖 AI Summary
This work addresses the challenges of low computational efficiency, scarce labeled data, and high expert annotation costs in zero-shot respiratory sound classification by proposing TRIAGE, a novel framework that introduces test-time adaptive computation for the first time. TRIAGE dynamically routes input audio to one of three inference pathways—lightweight label cosine scoring, structured clinical description matching, or retrieval-augmented large language model reasoning—based on confidence-based routing, enabling on-demand resource allocation. Requiring no task-specific training, TRIAGE achieves an average AUROC of 0.744 across nine classification tasks, with nearly half of the samples classified at the lowest computational tier. It improves uncertainty-aware sample identification accuracy by up to 19% and matches or surpasses supervised baselines in overall performance.

Technology Category

Application Category

📝 Abstract
Automated respiratory audio analysis promises scalable, non-invasive disease screening, yet progress is limited by scarce labeled data and costly expert annotation. Zero-shot inference eliminates task-specific supervision, but existing methods apply uniform computation to every input regardless of difficulty. We introduce TRIAGE, a tiered zero-shot framework that adaptively scales test-time compute by routing each audio sample through progressively richer reasoning stages: fast label-cosine scoring in a joint audio-text embedding space (Tier-L), structured matching with clinician-style descriptors (Tier-M), and retrieval-augmented large language model reasoning (Tier-H). A confidence-based router finalizes easy predictions early while allocating additional computation to ambiguous inputs, enabling nearly half of all samples to exit at the cheapest tier. Across nine respiratory classification tasks without task-specific training, TRIAGE achieves a mean AUROC of 0.744, outperforming prior zero-shot methods and matching or exceeding supervised baselines on multiple tasks. Our analysis show that test-time scaling concentrates gains where they matter: uncertain cases see up to 19% relative improvement while confident predictions remain unchanged at minimal cost.
Problem

Research questions and friction points this paper is trying to address.

zero-shot classification
respiratory audio analysis
test-time scaling
adaptive computation
audio classification
Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive test-time scaling
zero-shot learning
tiered inference
audio-text embedding
retrieval-augmented reasoning
🔎 Similar Papers
No similar papers found.