Published papers: HD-EPIC: A Highly-Detailed Egocentric Video Dataset (CVPR 2025), ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions (CVPR 2025), Show Me the World in My Language: Establishing the First Baseline for Scene-Text to Scene-Text Translation (ICPR 2024), Composite Sketch+Text Queries for Retrieving Objects with Elusive Names and Complex Interactions (AAAI 2024), Towards Making Flowchart Images Machine Interpretable (ICDAR 2023), VisToT: Vision-Augmented Table-to-Text Generation (EMNLP 2022), COFAR: Commonsense and Factual Reasoning in Image Search (AACL-IJCNLP 2022).
Research Experience
Worked as a Research Assistant at Indian Institute of Technology Jodhpur, advised by Prof. Anand Mishra, exploring vision-augmented table-to-text generation, cross-modal image retrieval, and other vision-language problems; Interned at the Center for Neuroscience, Indian Institute of Science, working on EEG Brain-Computer Interfaces under the guidance of Prof. Sridharan Devarajan.
Education
Received B.E. in Information Science and Engineering from Dayananda Sagar College of Engineering in 2020; Currently a second-year Ph.D. student at the University of Bristol, advised by Prof. Dima Damen.
Background
Research interests: AI for video understanding. Specialization: Machine Learning and Computer Vision.