Published over 20 papers in leading conferences and journals in machine learning and computer vision, including ICASSP, ICIP, ICME, FG, BMVC, CVPR, NeurIPS, TBIOM, and JSTSP with more than 400 citations on Google Scholar; received Outstanding Reviewer award at CVPR 2025; paper titled 'United we stand, Divided we fall: Handling Weak Complementarity for Audio-Visual Emotion Recognition in Valence-Arousal Space' accepted at ABAW@CVPR 2025 Workshop; achieved 2nd place in the valence-arousal challenge of 8th ABAW@CVPR 2025 competition; received Best Poster award for work on 'Dynamic Cross Attention for Emotion Recognition' at AI and Digital Health Symposium 2024, Montreal, Canada; paper 'Less is Enough: Adapting Pre-trained Vision Transformers for Audio-Visual Speaker Verification' accepted at ENLSP@NeurIPS 2024 Workshop; paper 'Incongruity-Aware Cross-Modal Attention' accepted at IEEE Journal of Selected Topics in Signal Processing [IF:13.7]; paper 'Recursive Joint Cross-modal attention' accepted at ABAW@CVPR 2024 Workshop; achieved 2nd place in the valence-arousal challenge of 6th ABAW@CVPR 2024 competition; one paper accepted at ICME 2024 (CORE A); two papers accepted at FG 2024; paper 'RJCA for Speaker Verification' accepted at NeurIPS 2023 3rd workshop on ENLSP; presented work 'Recurrent Joint Attention for Audio-Visual Fusion in Regression-based Emotion Recognition' at ICASSP 2023; successfully defended Ph.D. Thesis titled 'Deep Regression Models for Spatiotemporal Expression Recognition in Videos'.
Research Experience
Worked at Samsung Research India from 2014 to 2015; conducted PhD research at LIVIA lab, ETS Montreal from 2018 to 2023; currently a post-doctoral researcher at CRIM since 2023.
Education
Masters from Indian Institute of Technology Guwahati under the supervision of Prof. Kannan Karthik in 2012; PhD in artificial intelligence (focused on computer vision and affective computing) from LIVIA lab, ETS Montreal, Canada under the supervision of Prof. Eric Granger and Prof. Patrick Cardinal in 2023.
Background
Interested in computer vision, affective computing, deep learning, and multimodal video understanding models. Most of my research revolves around video analytics, weakly supervised learning, facial behavior analysis, and multimodal (audio-visual) learning.
Miscellany
Likes to play rhythm instruments in free time. Also prefers to read books and occasionally do blogging.