🤖 AI Summary
This study addresses the challenge of limited scalability in analyzing autism spectrum disorder (ASD) intervention strategies within parent–child shared reading videos in home settings, where existing approaches rely heavily on expert manual annotation. To overcome this, the authors propose InterventionLens, an end-to-end multi-agent system that requires no task-specific training or fine-tuning. This work introduces, for the first time, a collaborative multi-agent architecture to naturalistic ASD intervention analysis, leveraging multimodal cues—including visual and linguistic signals—to enable fine-grained temporal detection and segmentation of intervention strategies. Evaluated on the ASD-HI dataset, the method achieves an overall F1 score of 79.44%, representing a substantial 19.72% improvement over current baselines, thereby significantly enhancing both scalability and practical applicability.
📝 Abstract
Home-based interventions like parent-child shared reading provide a cost-effective approach for supporting children with autism spectrum disorder (ASD). However, analyzing caregiver intervention strategies in naturalistic home interactions typically relies on expert annotation, which is costly, time-intensive, and difficult to scale. To address this challenge, we propose InterventionLens, an end-to-end multi-agent system for automatically detecting and temporally segmenting caregiver intervention strategies from shared reading videos. Without task-specific model training or fine-tuning, InterventionLens uses a collaborative multi-agent architecture to integrate multimodal interaction content and perform fine-grained strategy analysis. Experiments on the ASD-HI dataset show that InterventionLens achieves an overall F1 score of 79.44\%, outperforming the baseline by 19.72\%. These results suggest that InterventionLens is a promising system for analyzing caregiver intervention strategies in home-based ASD shared reading settings. Additional resources will be released on the project page.