Scholar

Pietro Lesci

Google Scholar ID: uRIcVlAAAAAJ

University of Cambridge

InterpretabilityCausalityMemorisationTokenisationActive Learning

Citations & Impact

All-time

Citations

H-index

i10-index

Publications

Co-authors

list available

Contact

Publications

5 items

2026

Cited

2025

Cited

2025

Cited

2025

Cited

2025

Cited

Resume (English only)

Academic Achievements

- Publications:
- 'Causal Estimation of Tokenisation Bias' accepted at ACL 2025
- 'Causal Estimation of Memorisation Profiles' recognized as Paper of the Year by Cambridge’s Department of Computer Science and Technology, and won Best Paper Award at ACL 2024
- 'PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs' (first author) accepted at ICLR 2025
- 'Self-Training Large Language Models for Tool-Use Without Demonstrations' accepted at NAACL 2025 (Findings)
- 'AnchorAL: Computationally Efficient Active Learning for Large and Imbalanced Datasets' accepted at NAACL 2024 (main)
- Awards:
- Best Paper Award at ACL 2024
- Paper of the Year Award from Cambridge’s Department of Computer Science and Technology
- Funding from Translated’s Imminent Research Grant
- Other Achievements:
- Recognized as one of the Outstanding Reviewers for EMNLP 2024
- 'Large Language Model Memorization (L2M2)' workshop proposal accepted at ACL 2025

Research Experience

- September 2022 to present: PhD student in Computer Science at the University of Cambridge
- January 2022: Wordify 2.0 released
- September 2022: Joined Amazon AWS AI Labs, working on efficient dialogue state tracking
- June 2023: First-ever paper 'Diable: Efficient Dialogue State Tracking as Operations on Tables' accepted at ACL 2023 (Findings)

Education

Background

- Research Interests: How training data influences a model’s behavior
- Professional Field: Causal methods, active learning, tokenization, and pre-training
- Background: PhD student in Computer Science at the University of Cambridge, with a background in economics and over 3 years of experience across research labs, consulting firms, and international institutions, focusing on custom model training and data science solutions.

Miscellany