JFAA: Technical Report for the EPIC-KITCHENS-100 Action Anticipation Challenge at EgoVis 2026

📅 2026-05-20

📈 Citations: 0

✨ Influential: 0

career value

195K/year

🤖 AI Summary

This work addresses the task of first-person video action anticipation on the EPIC-KITCHENS-100 dataset by proposing an efficient approach based on the V-JEPA 2.1 architecture. The method leverages a frozen encoder–predictor framework to extract contextual representations from observed video segments and latent features of the near future, followed by lightweight task-query attention probes to separately predict verbs, nouns, and complete actions. A novel field-aware ensemble strategy is introduced, which selectively fuses results from multiple training runs according to output fields, substantially enhancing robustness and accuracy across all prediction dimensions. This approach achieved first place in the EPIC-KITCHENS-100 Action Anticipation Challenge at the EgoVis 2026 official evaluation.

📝 Abstract

We propose JFAA, a JEPA-based Future Action Anticipation method for the EPIC-KITCHENS-100 (EK-100) Action Anticipation task. Inspired by the representation learning and future prediction ability of V-JEPA 2.1, JFAA uses a frozen encoder and predictor to extract observed context features and near-future latent tokens. A lightweight attentive probe is then trained to predict verb, noun, and action logits with separate task queries. To improve robustness, we further build a field-aware ensemble over selected epoch-level predictions, allowing each output field to benefit from its most reliable candidates. Experimental results on the official challenge server show that JFAA achieves first place in the EgoVis 2026 EK-100 Action Anticipation Challenge. Our code will be released at https://github.com/CorrineQiu/JFAA.

Problem

Research questions and friction points this paper is trying to address.

Action Anticipation

First-person Vision

EPIC-KITCHENS-100

Future Prediction

Ego-centric Video

Innovation

Methods, ideas, or system contributions that make the work stand out.

JEPA

action anticipation

frozen encoder