🤖 AI Summary
This work addresses personalized gaze estimation in first-person videos. We propose FedCPF, a personalized federated learning framework for gaze prediction. To balance global knowledge sharing with client-specific gaze pattern adaptation, FedCPF introduces a novel fine-grained dynamic parameter freezing mechanism based on parameter update rate: during local client training, it identifies and freezes the most rapidly changing model parameters in real time, enabling lightweight and interpretable personalization. Built upon a Transformer architecture, FedCPF requires no additional personalized modules or extra communication overhead. Evaluated on EGTEA Gaze+ and Ego4D, FedCPF significantly outperforms existing federated approaches—achieving superior recall, precision, and F1-score—demonstrating strong effectiveness and generalization under low-resource, highly heterogeneous settings.
📝 Abstract
Egocentric video gaze estimation requires models to capture individual gaze patterns while adapting to diverse user data. Our approach leverages a transformer-based architecture, integrating it into a PFL framework where only the most significant parameters, those exhibiting the highest rate of change during training, are selected and frozen for personalization in client models. Through extensive experimentation on the EGTEA Gaze+ and Ego4D datasets, we demonstrate that FedCPF significantly outperforms previously reported federated learning methods, achieving superior recall, precision, and F1-score. These results confirm the effectiveness of our comprehensive parameters freezing strategy in enhancing model personalization, making FedCPF a promising approach for tasks requiring both adaptability and accuracy in federated learning settings.