Parameter-Efficient Fine-Tuning of DINOv2 for Large-Scale Font Classification

πŸ“… 2026-02-14
πŸ“ˆ Citations: 0
✨ Influential: 0
πŸ“„ PDF
πŸ€– AI Summary
This work proposes an efficient font classification method based on the DINOv2 vision transformer for large-scale recognition across 394 font families. By leveraging Low-Rank Adaptation (LoRA), the approach fine-tunes fewer than 1% of the model’s parameters while incorporating a scalable synthetic data generation pipeline that includes augmentations such as random colors, alignment variations, line breaks, and Gaussian noise to significantly enhance generalization. The method achieves a Top-1 accuracy of approximately 86%, marking the first application of LoRA with DINOv2 for font classification and demonstrating both high efficiency and strong performance. The authors have open-sourced the trained models, dataset, and the complete training and deployment pipeline.

Technology Category

Application Category

πŸ“ Abstract
We present a font classification system capable of identifying 394 font families from rendered text images. Our approach fine-tunes a DINOv2 Vision Transformer using Low-Rank Adaptation (LoRA), achieving approximately 86% top-1 accuracy while training fewer than 1% of the model's 87.2M parameters. We introduce a synthetic dataset generation pipeline that renders Google Fonts at scale with diverse augmentations including randomized colors, alignment, line wrapping, and Gaussian noise, producing training images that generalize to real-world typographic samples. The model incorporates built-in preprocessing to ensure consistency between training and inference, and is deployed as a HuggingFace Inference Endpoint. We release the model, dataset, and full training pipeline as open-source resources.
Problem

Research questions and friction points this paper is trying to address.

font classification
parameter-efficient fine-tuning
large-scale
DINOv2
generalization
Innovation

Methods, ideas, or system contributions that make the work stand out.

Parameter-Efficient Fine-Tuning
Low-Rank Adaptation (LoRA)
DINOv2 Vision Transformer
Synthetic Font Dataset
Font Classification
πŸ”Ž Similar Papers
No similar papers found.