🤖 AI Summary
This work addresses the challenge of single-source domain generalization in crowd counting, where severe distribution shifts at test time and heterogeneous latent subdomains within the source domain lead to unstable sample-level clustering. To mitigate this, the authors propose a granular-ball-guided hierarchical latent domain discovery mechanism: robust representatives are generated via local granular-ball aggregation and then hierarchically clustered to stably identify latent domains. Furthermore, a dual-branch framework is introduced that effectively disentangles semantic and appearance features by combining semantic codebook recoding with a dedicated style branch. Extensive experiments on ShanghaiTech A/B, UCF_QNRF, and NWPU-Crowd demonstrate that the proposed method significantly outperforms strong baselines, particularly excelling under large domain shifts.
📝 Abstract
Single-source domain generalization for crowd counting remains highly challenging because a single labeled source domain often contains heterogeneous latent domains, while test data may exhibit severe distribution shifts. A fundamental difficulty lies in stable latent domain discovery: directly performing flat clustering on evolving sample-level latent features is easily affected by feature noise, outliers, and representation drift, leading to unreliable pseudo-domain assignments and weakened domain-structured learning. To address this issue, we propose a granular ball guided stable latent domain discovery framework for domain-general crowd counting. Specifically, the proposed method first organizes samples into compact local granular balls and then clusters granular ball centers as representatives to obtain pseudo-domains, transforming direct sample-level clustering into a hierarchical representative-based clustering process. This design yields more stable and semantically consistent pseudo-domain assignments. Built upon the discovered latent domains, we further develop a two-branch learning framework that enhances transferable semantic representations via semantic codebook re-encoding while modeling domain-specific appearance variations through a style branch, thereby reducing semantic--style entanglement and improving generalization under domain shifts. Extensive experiments on ShanghaiTech A/B, UCF\_QNRF, and NWPU-Crowd under a strict no-adaptation protocol demonstrate that the proposed method consistently outperforms strong baselines, especially under large domain gaps.