Akshita Gupta
Scholar

Akshita Gupta

Google Scholar ID: G01YeI0AAAAJ
TU Darmstadt
Deep LearningSpeech & Audio ProcessingComputer Vision
Citations & Impact
All-time
Citations
1,304
 
H-index
6
 
i10-index
5
 
Publications
17
 
Co-authors
0
 
Resume (English only)
Academic Achievements
  • 1. Paper: Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis (ArXiv 2025)
  • 2. Paper: LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization (WACV 2025 Oral)
  • 3. Paper: Open-Vocabulary Temporal Action Localization using Multimodal Guidance (BMVC 2024)
  • 4. Paper: Generative Multi-Label Zero-Shot Learning (TPAMI 2023)
  • 5. OW-DETR accepted at CVPR 2022
  • 6. BiAM accepted at ICCV 2021
  • 7. TF-VAEGAN accepted at ECCV 2020
  • 8. A Large-scale Instance Segmentation Dataset for Aerial Images (iSAID) available for download
  • 9. One paper accepted at Interspeech, CHiME Workshop 2018
  • 10. Selected as an Outreachy intern with Mozilla (May 2018)
  • 11. Conference and Journal Reviewing: CVPR (2022–2025), ECCV (2022, 2024), ICCV (2021), TPAMI (Journal)
Research Experience
  • 1. Apple, Research Intern (2024-2025), Under Dr. Tatiana Likhomanenko
  • 2. Microsoft Research, Research Intern (2023-2024), Under Gaurav Mittal and Mei Chen
  • 3. NextAI, Scientist-in-Residence (2024), With Prof. Graham Taylor
  • 4. Bayanat, Data Scientist (2022), Focused on detection and segmentation projects
  • 5. Inception Institute of Artificial Intelligence, Research Engineer (2018-2022), Worked with Dr. Sanath Narayan, Dr. Salman Khan, and Dr. Fahad Shahbaz Khan
Education
  • 1. ELLIS PhD student, TU Darmstadt (2025-Present), Supervised by Prof. Marcus Rohrbach and Dr. Federico Tombari (Google Zurich)
  • 2. MASc, University of Guelph (2022-2024), Advised by Prof. Graham Taylor
  • 3. Student Researcher, Vector Institute (2022-2024)
Background
  • Research Interests: Broadly interested in building scalable, multimodal models that combine vision, language, and speech modalities with interests in efficient modeling, temporal understanding, and open-world generalization.
Miscellany
  • Personal Interests: Not specifically mentioned
Co-authors
0 total
Co-authors: 0 (list not available)