π€ AI Summary
This work addresses the challenges of ambiguous many-to-many mappings and subjective annotations arising from the semantic gap between visual content and textual descriptions in object recognition datasets. To mitigate these issues, the authors propose an interactive crowdsourcing framework that integrates knowledge representation, natural language processing, and computer vision. The framework dynamically generates guided questions based on a predefined category hierarchy and visual attribute constraints, iteratively refining annotations through worker feedback. Experimental results demonstrate that the proposed approach significantly improves annotation consistency and quality, effectively narrows the semantic gap, and enhances the overall design of the crowdsourcing workflow.
π Abstract
Recent advances in data-centric artificial intelligence highlight inherent limitations in object recognition datasets. One of the primary issues stems from the semantic gap problem, which results in complex many-to-many mappings between visual data and linguistic descriptions. This bias adversely affects performance in computer vision tasks. This paper proposes an image annotation methodology that integrates knowledge representation, natural language processing, and computer vision techniques, aiming to reduce annotator subjectivity by applying visual property constraints. We introduce an interactive crowdsourcing framework that dynamically asks questions based on a predefined object category hierarchy and annotator feedback, guiding image annotation by visual properties. Experiments demonstrate the effectiveness of this methodology, and annotator feedback is discussed to optimize the crowdsourcing setup.