Hamza Harkous
Scholar

Hamza Harkous

Google Scholar ID: EzQ9nw0AAAAJ
Research Scientist at Google
Synthetic DataNatural Language ProcessingPrivacy
Citations & Impact
All-time
Citations
1,320
 
H-index
13
 
i10-index
14
 
Publications
20
 
Co-authors
38
list available
Resume (English only)
Research Experience
  • Nov 2023 – Present: Staff Research Scientist at Google
  • Founded and serving as Core Developer & Scientific Lead for Simula, a multi-step, multimodal, agentic synthetic-data framework used by 180+ monthly active Googlers; powers Gemini safety classifiers, ShieldGemma models, Android scam detection, etc.
  • Nov 2021 – Oct 2023: Senior Research Scientist at Google
  • Founded an internal data-curation platform combining diversified retrieval, active learning, and LLM assistance; grew team to 13 engineers; continued as ML Lead & Architect for Google Checks
  • Feb 2020 – Oct 2021: Research Scientist at Google
  • Architected initial ML pipeline for Google Checks; led Hark project building large-scale privacy-feedback analytics system used by 300+ triagers daily
  • Jul 2019 – Jan 2020: Applied Scientist at Amazon Alexa
  • Developed DATATUNER, a neural data-to-text system with state-of-the-art semantic fidelity
  • Nov 2018 – May 2019: Machine-Learning & Privacy Consultant at Privately SA
  • Shipped on-device classifiers for hate-speech, toxicity, and emotion detection deployed in a BBC-branded mobile keyboard
  • Jul 2017 – Sep 2018: Post-doctoral Researcher at EPFL (LSIR Lab)
  • Lead author and developer of Polisis, an AI tool analyzing privacy policies for >45,000 users, featured in Wired and WSJ