HoverAI: An Embodied Aerial Agent for Natural Human-Drone Interaction

📅 2026-01-20
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work proposes a novel aerial embodied intelligent agent designed to address the challenge of interaction uncertainty arising from unintelligible drone intentions in human-inhabited spaces due to the lack of effective communication mechanisms. The system uniquely integrates flight capability, infrastructure-free MEMS laser projection paired with an onboard semi-rigid screen, and an adaptive multimodal dialogue AI. Leveraging an RGB camera for visual and speech input, it combines voice activity detection (VAD), Whisper-based transcription, LLM-driven intent classification, a RAG-enhanced dialogue system, facial analysis, and XTTS v2 for lip-synced, personalized avatar responses. Evaluated in naturalistic interactions, the system achieves high accuracy in command recognition (F1: 0.90), demographic attribute estimation (gender F1: 0.89; age MAE: 5.14 years), and speech transcription (WER: 0.181), demonstrating robust spatial awareness and socially responsive capabilities.

Technology Category

Application Category

📝 Abstract
Drones operating in human-occupied spaces suffer from insufficient communication mechanisms that create uncertainty about their intentions. We present HoverAI, an embodied aerial agent that integrates drone mobility, infrastructure-independent visual projection, and real-time conversational AI into a unified platform. Equipped with a MEMS laser projector, onboard semi-rigid screen, and RGB camera, HoverAI perceives users through vision and voice, responding via lip-synced avatars that adapt appearance to user demographics. The system employs a multimodal pipeline combining VAD, ASR (Whisper), LLM-based intent classification, RAG for dialogue, face analysis for personalization, and voice synthesis (XTTS v2). Evaluation demonstrates high accuracy in command recognition (F1: 0.90), demographic estimation (gender F1: 0.89, age MAE: 5.14 years), and speech transcription (WER: 0.181). By uniting aerial robotics with adaptive conversational AI and self-contained visual output, HoverAI introduces a new class of spatially-aware, socially responsive embodied agents for applications in guidance, assistance, and human-centered interaction.
Problem

Research questions and friction points this paper is trying to address.

human-drone interaction
communication mechanisms
intent uncertainty
embodied aerial agent
socially responsive agents
Innovation

Methods, ideas, or system contributions that make the work stand out.

embodied aerial agent
multimodal interaction
onboard visual projection
conversational AI
demographic-aware personalization
🔎 Similar Papers
No similar papers found.
Y
Yuhua Jin
Chinese University of Hong Kong, Shenzhen, Guangdong, China
N
Nikita Kuzmin
Skolkovo Institute of Science and Technology, Moscow, Russia
G
Georgii R. Demianchuk
Skolkovo Institute of Science and Technology, Moscow, Russia
M
Mariya Lezina
Skolkovo Institute of Science and Technology, Moscow, Russia
F
Fawad Mehboob
Skolkovo Institute of Science and Technology, Moscow, Russia
Issatay Tokmurziyev
Issatay Tokmurziyev
MSc Graduate, Skoltech
ElectronicsRoboticsHRIGaze
Miguel Altamirano Cabrera
Miguel Altamirano Cabrera
Research Scientist, Skolkovo Institute of Science and Technology
HapticsRoboticsTactile SensationComputer Vision
Muhammad Ahsan Mustafa
Muhammad Ahsan Mustafa
Msc Student
Aerial RoboticsAgile DronesModel Predictive ControlReinforcement Learning
Dzmitry Tsetserukou
Dzmitry Tsetserukou
Associate Professor, Skolkovo Institute of Science and Technology (Skoltech)
RoboticsHapticsUAV SwarmAIVR