Scholar

David Alvarez-Melis

Google Scholar ID: XsxZrYYAAAAJ

Harvard University & Microsoft Research

Machine LearningOptimal TransportNatural Language ProcessingInterpretability

Homepage↗Google Scholar↗

Citations & Impact

All-time

Citations

4,176

H-index

i10-index

Publications

Co-authors

list available

Contact

Emaildam@seas.harvard.edu CVOpen ↗TwitterOpen ↗GitHubOpen ↗LinkedInOpen ↗

Publications

29 items

Domain-Aware Scaling Laws Uncover Data Synergy

2026

Cited

Understanding Layer Patching in Model Size Interpolation

2026

Cited

Adversarial Concept Search: Predicting Compositional Errors From Feature Geometry

2026

Cited

Low-Frequency Shortcuts in Texture-Driven Visual Learning

2026

Cited

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

2026

Cited

Why Larger Models Learn More: Effects of Capacity, Interference, and Rare-Task Retention

2026

Cited

OT on the Map: Quantifying Domain Shifts in Geographic Space

2026

Cited

Beyond Explainable AI (XAI): An Overdue Paradigm Shift and Post-XAI Research Directions

2026

Cited

Resume (English only)

Academic Achievements

- Publications:
- Investigating the interaction of linguistic and mathematical reasoning in language models using multilingual number puzzles, EMNLP 2025
- Data Drives Unstable Hierarchical Generalization in LMs, EMNLP 2025
- To Backtrack or Not to Backtrack: When Sequential Search Limits Model Reasoning, COLM 2025
- What is the Right Notion of Distance between Predict-then-Optimize Tasks?, UAI 2025
- DDEQs: Distributional Deep Equilibrium Models through Wasserstein Gradient Flows, AISTATS 2025
- Mixture of Parrots: Experts improve memorization more than reasoning, ICLR 2025
- A Label is Worth a Thousand Images in Dataset Distillation, NeurIPS 2024
- Projects: OTDD has been incorporated into the DataSimilarity R package

Research Experience

- Worked at CSAIL, MIT, on various topics in machine learning and natural language processing
- Spent one year at IBM's T.J. Watson Research Center, working in the Speech Recognition and NLP teams
- Currently an Assistant Professor at Harvard SEAS, leading the Data-Centric Machine Learning (DCML) group

Education

- PhD: Massachusetts Institute of Technology (MIT), Computer Science
- Advisor: Not mentioned
- Time: Not mentioned
- Field: Machine learning and natural language processing
- MS: Courant Institute (NYU), Mathematics
- BSc: ITAM, Mathematics

Background

- Research Interests: Making machine learning more broadly applicable (especially to data-poor applications) and trustworthy (e.g., robust and interpretable)
- Field: Computer Science
- Bio: Assistant Professor at Harvard SEAS, leading the Data-Centric Machine Learning (DCML) group, also an Associate Faculty at the Kempner Institute, and affiliated with the Center for Research on Computation and Society and the Harvard Data Science Initiative. Also a researcher at Microsoft Research New England.

Miscellany