Paper 'Predicting Implicit Arguments in Procedural Video Instructions' accepted at ACL 2025 (Main Conference, May 2025)
Paper 'Efficient Pre-training for Localized Instruction Generation of Videos' accepted at ECCV 2024 (July 2024)
Paper 'Temporal Ordering in the Segmentation of Instructional Videos' accepted at BMVC 2022 (September 2022)
Paper on 'Improved Road Connectivity' accepted at CVPR 2019 (March 2019)
Paper on 'Self-Supervised Learning' accepted at BMVC 2018 (June 2018)
Poster presentation at CVPR 2019 (June 2019)
Served as session chair volunteer at LXAI Workshop, NeurIPS 2021 (December 2021)
Served as reviewer for CVPR 2022 (November 2021)
Served as reviewer for ICCV 2021 (June 2021)
Successfully defended Master’s thesis titled 'Road Topology Extraction from Satellite images by Knowledge Sharing' (June 2019)
Education
Ph.D. scholar in the School of Informatics, University of Edinburgh, under the CDT-NLP program (joined September 2020)
Supervised by Prof. Frank Keller, Prof. Marcus Rohrbach, and Dr. Laura Sevilla
Completed Master of Computer Science (by research) at IIIT-Hyderabad
Master’s advisors: Prof. C.V. Jawahar, with mentorship from Facebook researchers Dr. Guan Pang and Dr. Saikat Basu
Member of the Center for Visual Information Technology (CVIT) Lab during Master’s
Background
Research interests lie at the intersection of Language and Vision
Focuses on developing models that can plan, reason, and execute goal-oriented tasks involving multiple complex events through text comprehension and video analysis
Current work involves analyzing long procedural videos and text to understand and ground the temporal structure of events
Aims to develop efficient models that accurately capture event sequences and timing to enhance real-world task performance
Also interested in Geospatial data, large language models, and improving model reliability