Paper: 'MMTEB: Massive Multilingual Text Embedding Benchmark', ICLR 2025
Paper: 'Efficient In-Domain Question Answering for Resource-Constrained Environments', arXiv:2409.17648, 2024
Paper: 'Beyond Toxic: Toxicity Detection Datasets are Not Enough for Brand Safety', arXiv:2303.15110, 2023
Frequent speaker at PyCon, PyData, Berlin Buzzwords, etc., on topics including LLM production, RAG, and reproducibility in embedding benchmarks
Core maintainer of MTEB (Massive Text Embedding Benchmark)
Open-source contributor to haystack, transformers, llama-index, auto-sklearn
Background
Focuses on making AI systems scalable and maintainable
Currently a Staff Machine Learning Scientist at Zendesk
Tech stack includes Python, Docker, Kubernetes, PostgreSQL, and Go
Previously at Clarifai, led custom enterprise solutions for visual search and text moderation, built multi-modal retrieval systems, and conducted applied research to improve question-answering systems
Has spoken at various Python and ML conferences and meetups across Europe
Miscellany
Works remotely from Tallinn, Estonia
Contributes to open-source projects such as MTEB in spare time
Enjoys traveling and outdoor activities; formerly an active triathlete
Currently into cycling, running, and hiking
Maintains a technical blog on Generative AI/ML, designed for 3–5 minute reads
Has written technical blogs for Clarifai and Neptune