Adaptive Test-Time Scaling for Zero-Shot Respiratory Audio Classification

📅 2026-04-14

📈 Citations: 0

✨ Influential: 0

career value

182K/year

🤖 AI Summary

This work addresses the challenges of low computational efficiency, scarce labeled data, and high expert annotation costs in zero-shot respiratory sound classification by proposing TRIAGE, a novel framework that introduces test-time adaptive computation for the first time. TRIAGE dynamically routes input audio to one of three inference pathways—lightweight label cosine scoring, structured clinical description matching, or retrieval-augmented large language model reasoning—based on confidence-based routing, enabling on-demand resource allocation. Requiring no task-specific training, TRIAGE achieves an average AUROC of 0.744 across nine classification tasks, with nearly half of the samples classified at the lowest computational tier. It improves uncertainty-aware sample identification accuracy by up to 19% and matches or surpasses supervised baselines in overall performance.

Technology Category

Application Category

📝 Abstract

Automated respiratory audio analysis promises scalable, non-invasive disease screening, yet progress is limited by scarce labeled data and costly expert annotation. Zero-shot inference eliminates task-specific supervision, but existing methods apply uniform computation to every input regardless of difficulty. We introduce TRIAGE, a tiered zero-shot framework that adaptively scales test-time compute by routing each audio sample through progressively richer reasoning stages: fast label-cosine scoring in a joint audio-text embedding space (Tier-L), structured matching with clinician-style descriptors (Tier-M), and retrieval-augmented large language model reasoning (Tier-H). A confidence-based router finalizes easy predictions early while allocating additional computation to ambiguous inputs, enabling nearly half of all samples to exit at the cheapest tier. Across nine respiratory classification tasks without task-specific training, TRIAGE achieves a mean AUROC of 0.744, outperforming prior zero-shot methods and matching or exceeding supervised baselines on multiple tasks. Our analysis show that test-time scaling concentrates gains where they matter: uncertain cases see up to 19% relative improvement while confident predictions remain unchanged at minimal cost.

Problem

Research questions and friction points this paper is trying to address.

zero-shot classification

respiratory audio analysis

test-time scaling

adaptive computation

audio classification

Innovation

Methods, ideas, or system contributions that make the work stand out.

adaptive test-time scaling

zero-shot learning

tiered inference