HLTCOE Evaluation Team at TREC 2025: VQA Track

📅 2025-12-08

📈 Citations: 0

✨ Influential: 0

career value

153K/year

🤖 AI Summary

To address semantic inaccuracy and ranking inconsistency in answer generation for Video Question Answering (VQA), this paper proposes an end-to-end listwise learning framework. The method integrates generative candidate construction with discriminative re-ranking: first, an initial answer list is generated using a multimodal foundation model; then, a Masked Pointer Cross-Entropy Loss—incorporating pointer-based selection, dynamic rank-weighted optimization, and lexical constraints—is introduced to perform semantic-aware list refinement. Unlike conventional single-answer generation paradigms, our approach significantly improves accuracy and ranking stability on complex questions requiring temporal reasoning and semantic disambiguation. Extensive experiments on multiple VQA benchmarks demonstrate both effectiveness and interpretability of the proposed framework.

Technology Category

Application Category

📝 Abstract

The HLTCOE Evaluation team participated in TREC VQA's Answer Generation (AG) task, for which we developed a listwise learning framework that aims to improve semantic precision and ranking consistency in answer generation. Given a video-question pair, a base multimodal model first generates multiple candidate answers, which are then reranked using a model trained with a novel Masked Pointer Cross-Entropy Loss with Rank Weights. This objective integrates pointer-based candidate selection, rank-dependent weighting, and masked cross-entropy under vocabulary restriction, enabling stable and interpretable listwise optimization. By bridging generative modeling with discriminative ranking, our method produces coherent, fine-grained answer lists. Experiments reveal consistent gains in accuracy and ranking stability, especially for questions requiring temporal reasoning and semantic disambiguation.

Problem

Research questions and friction points this paper is trying to address.

Improves semantic precision in video question answering

Enhances ranking consistency for generated answer lists

Addresses temporal reasoning and semantic disambiguation challenges

Innovation

Methods, ideas, or system contributions that make the work stand out.

Listwise learning framework for answer generation

Masked Pointer Cross-Entropy Loss with Rank Weights

Bridges generative modeling with discriminative ranking

🔎 Similar Papers

No similar papers found.