Hiroki Furuta
Scholar

Hiroki Furuta

Google Scholar ID: M0OhM1UAAAAJ
Google DeepMind
Large Language ModelsReinforcement LearningMachine Learning
Citations & Impact
All-time
Citations
2,142
 
H-index
13
 
i10-index
14
 
Publications
20
 
Co-authors
23
list available
Resume (English only)
Academic Achievements
  • Publications:
  • - Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback (arXiv, 2024)
  • - Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties (NeurIPS 2025)
  • - Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search (NeurIPS 2025)
  • - Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence (ICML 2025)
  • - Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks (ICML 2025)
  • - Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words (ICLR 2025)
  • - Geometric-Averaged Preference Optimization for Soft Preference Labels (NeurIPS 2024)
  • - A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts (ICML 2024)
  • - Open X-Embodiment: Robotic Learning Datasets and RT-X Models (ICRA 2024, Best Conference Paper Award)
  • - A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR 2024, Oral, 1.2% acceptance rate)
  • - Multimodal Web Navigation with Instruction-Finetuned Foundation Models (ICLR 2024)
  • - A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR 2023, Notable-top-25%, 8% acceptance rate)
Research Experience
  • Research Scientist at Google DeepMind, primarily working on multimodal AI agents and alignment for diffusion models. Former Student Researcher at Google DeepMind, hosted by Heiga Zen and Izzeddin Gur.
Education
  • Ph.D.: The University of Tokyo, Advisor: Yutaka Matsuo; BEng and MEng: The University of Tokyo, Advisors: Yutaka Matsuo and Shixiang Shane Gu.
Background
  • Research Interests: Multimodal AI agents, alignment for diffusion models, and mechanistic interpretability of LLMs. Professional field: Artificial Intelligence, particularly focusing on multimodal AI agents and alignment.