Scholar

Akshita Gupta

Google Scholar ID: G01YeI0AAAAJ

TU Darmstadt

Deep LearningSpeech & Audio ProcessingComputer Vision

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

1,304

H-index

i10-index

Publications

Co-authors

Contact

Emailakshita.sem.iitr@gmail.com CVOpen ↗TwitterOpen ↗GitHubOpen ↗

Publications

8 items

ReCap: Lightweight Referential Grounding for Coherent Story Visualization

2026

Cited

HaloProbe: Bayesian Detection and Mitigation of Object Hallucinations in Vision-Language Models

2026

Cited

A Comprehensive Benchmark of Histopathology Foundation Models for Kidney Histopathology

2026

Cited

Extending $\mu$P: Spectral Conditions for Feature Learning Across Optimizers

2026

Cited

Information-Driven Fault Detection and Identification for Multi-Agent Spacecraft Systems: Collaborative On-Orbit Inspection Mission

2025

Cited

Global Task-aware Fault Detection, Identification For On-Orbit Multi-Spacecraft Collaborative Inspection

2025

Cited

Analysis of Climatic Trends and Variability in Indian Topography

2025

Cited

Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis

arXiv.org · 2024

Cited

Resume (English only)

Academic Achievements

1. Paper: Visatronic: A Multimodal Decoder-Only Model for Speech Synthesis (ArXiv 2025)
2. Paper: LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization (WACV 2025 Oral)
3. Paper: Open-Vocabulary Temporal Action Localization using Multimodal Guidance (BMVC 2024)
4. Paper: Generative Multi-Label Zero-Shot Learning (TPAMI 2023)
5. OW-DETR accepted at CVPR 2022
6. BiAM accepted at ICCV 2021
7. TF-VAEGAN accepted at ECCV 2020
8. A Large-scale Instance Segmentation Dataset for Aerial Images (iSAID) available for download
9. One paper accepted at Interspeech, CHiME Workshop 2018
10. Selected as an Outreachy intern with Mozilla (May 2018)
11. Conference and Journal Reviewing: CVPR (2022–2025), ECCV (2022, 2024), ICCV (2021), TPAMI (Journal)

Research Experience

1. Apple, Research Intern (2024-2025), Under Dr. Tatiana Likhomanenko
2. Microsoft Research, Research Intern (2023-2024), Under Gaurav Mittal and Mei Chen
3. NextAI, Scientist-in-Residence (2024), With Prof. Graham Taylor
4. Bayanat, Data Scientist (2022), Focused on detection and segmentation projects
5. Inception Institute of Artificial Intelligence, Research Engineer (2018-2022), Worked with Dr. Sanath Narayan, Dr. Salman Khan, and Dr. Fahad Shahbaz Khan

Education

1. ELLIS PhD student, TU Darmstadt (2025-Present), Supervised by Prof. Marcus Rohrbach and Dr. Federico Tombari (Google Zurich)
2. MASc, University of Guelph (2022-2024), Advised by Prof. Graham Taylor
3. Student Researcher, Vector Institute (2022-2024)

Background

Research Interests: Broadly interested in building scalable, multimodal models that combine vision, language, and speech modalities with interests in efficient modeling, temporal understanding, and open-world generalization.

Miscellany

Personal Interests: Not specifically mentioned

Co-authors

0 total

Co-authors: 0 (list not available)