Luca Soldaini
Scholar

Luca Soldaini

Google Scholar ID: 3KPvwcgAAAAJ
Allen Institute for AI
Large Language ModelsOpen Source AIInformation Retrieval
Citations & Impact
All-time
Citations
4,323
 
H-index
29
 
i10-index
53
 
Publications
20
 
Co-authors
8
list available
Resume (English only)
Academic Achievements
  • Olmo project received two Best Paper Awards at ACL 2024
  • Led the release of fully open Olmo 2 models (7B, 13B, 32B)
  • Launched Tülu 3 post-training pipeline and Molmo multimodal models
  • Developed olmOCR, a high-performance toolkit for PDF text extraction
  • Created predictive techniques and benchmarks to characterize LLM behavior during pretraining
Research Experience
  • Co-leads the data team for Ai2’s Olmo project with Kyle Lo
  • Develops adaptation recipes for LLMs, including the Tülu 3 post-training pipeline (supporting models up to 405B parameters)
  • Contributed to the Molmo family of open multimodal AI models
  • Co-developed tools for analyzing and improving LLM pipelines: AboutMe, WIMBD, WebOrganizer, and olmOCR
  • Investigated LLM-retrieval system interfaces; co-proposed FollowIR with Orion Weller, later extended to multilingual settings
  • Collaborated on OpenSciLLM, an end-to-end demo for literature-grounded scientific synthesis using LLMs
Miscellany
  • Enjoys brewing espresso and going on runs
  • Dreams about utopian mass transit systems
  • Curates a growing collection of laptop stickers
  • Spends time with his handsome cat
  • Believes raccoons are the best