Scholar
Guilherme Penedo
Google Scholar ID: L-jmoJYAAAAJ
ML Research Engineer at 🤗 HuggingFace
Follow
Google Scholar
↗
Citations & Impact
All-time
Citations
2,716
H-index
9
i10-index
9
Publications
13
Co-authors
0
Contact
No contact links provided.
Publications
4 items
FineWeb2: One Pipeline to Scale Them All -- Adapting Pre-Training Data Processing to Every Language
2025
Cited
0
The Common Pile v0.1: An 8TB Dataset of Public Domain and Openly Licensed Text
2025
Cited
0
SmolLM2: When Smol Goes Big -- Data-Centric Training of a Small Language Model
2025
Cited
0
Towards Best Practices for Open Datasets for LLM Training
2025
Cited
0
Resume (English only)
Co-authors
0 total
Co-authors: 0 (list not available)
×
Welcome back
Sign in to Agora
Welcome back! Please sign in to continue.
Email address
Password
Forgot password?
Continue
Do not have an account?
Sign up