🤖 AI Summary
This paper addresses the challenge in visual question answering (VQA) where both images and questions are represented as Answer Set Programming (ASP) programs, and domain-specific annotated data are scarce. To tackle this, we propose a logic abduction-based approach for modeling domain relations—requiring no external knowledge base and instead automatically inferring implicit semantic relationships among image constituents from a small set of historical examples. This enables the induction and learning of transferable logical rules. Our key contribution is the first application of logical abduction to domain knowledge acquisition in VQA, integrating ASP formalization, example-driven relation induction, and lightweight rule learning. Experiments under low-shot settings demonstrate significant improvements in VQA accuracy, validating the method’s effectiveness, generalizability, and practicality in scenarios with limited or no domain annotations.
📝 Abstract
In this paper, we study the problem of visual question answering (VQA) where the image and query are represented by ASP programs that lack domain data. We provide an approach that is orthogonal and complementary to existing knowledge augmentation techniques where we abduce domain relationships of image constructs from past examples. After framing the abduction problem, we provide a baseline approach, and an implementation that significantly improves the accuracy of query answering yet requires few examples.