Scholar

Hiroki Furuta

Google Scholar ID: M0OhM1UAAAAJ

Google DeepMind

Large Language ModelsReinforcement LearningMachine Learning

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

2,142

H-index

i10-index

Publications

Co-authors

list available

Contact

Emailhirokifuruta@google.com TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

15 items

Drifting Objectives for Refining Discrete Diffusion Language Models

2026

Cited

Diffusion-State Policy Optimization for Masked Diffusion Language Models

2026

Cited

Emergent Analogical Reasoning in Transformers

2026

Cited

WorldPack: Compressed Memory Improves Spatial Consistency in Video World Modeling

2025

Cited

MultiBanana: A Challenging Benchmark for Multi-Reference Text-to-Image Generation

2025

Cited

Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties

2025

Cited

Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence

2025

Cited

Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks

2025

Cited

Resume (English only)

Academic Achievements

Publications:
- Improving Dynamic Object Interactions in Text-to-Video Generation with AI Feedback (arXiv, 2024)
- Topology of Reasoning: Understanding Large Reasoning Models through Reasoning Graph Properties (NeurIPS 2025)
- Inference-Time Text-to-Video Alignment with Diffusion Latent Beam Search (NeurIPS 2025)
- Beyond Induction Heads: In-Context Meta Learning Induces Multi-Phase Circuit Emergence (ICML 2025)
- Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks (ICML 2025)
- Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words (ICLR 2025)
- Geometric-Averaged Preference Optimization for Soft Preference Labels (NeurIPS 2024)
- A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts (ICML 2024)
- Open X-Embodiment: Robotic Learning Datasets and RT-X Models (ICRA 2024, Best Conference Paper Award)
- A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis (ICLR 2024, Oral, 1.2% acceptance rate)
- Multimodal Web Navigation with Instruction-Finetuned Foundation Models (ICLR 2024)
- A System for Morphology-Task Generalization via Unified Representation and Behavior Distillation (ICLR 2023, Notable-top-25%, 8% acceptance rate)

Research Experience

Research Scientist at Google DeepMind, primarily working on multimodal AI agents and alignment for diffusion models. Former Student Researcher at Google DeepMind, hosted by Heiga Zen and Izzeddin Gur.

Education

Ph.D.: The University of Tokyo, Advisor: Yutaka Matsuo; BEng and MEng: The University of Tokyo, Advisors: Yutaka Matsuo and Shixiang Shane Gu.

Background

Research Interests: Multimodal AI agents, alignment for diffusion models, and mechanistic interpretability of LLMs. Professional field: Artificial Intelligence, particularly focusing on multimodal AI agents and alignment.

Co-authors

23 total