Fine-Grained Cat Breed Recognition with Global Context Vision Transformer

📅 2026-02-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenge of fine-grained image recognition in cat breed classification, where subtle variations in fur color and facial structure often hinder accurate discrimination. To tackle this issue, the work proposes the first application of the Global Context Vision Transformer (GCViT-Tiny) to this domain. Leveraging a subset of the Oxford-IIIT Pet dataset, the approach incorporates data augmentation techniques—including rotation, horizontal flipping, and brightness adjustment—to effectively model global contextual information and enhance discriminative capacity. The proposed model achieves classification accuracies of 94.54% on the validation set and 92.00% on the test set, demonstrating the effectiveness and superiority of GCViT for fine-grained visual categorization tasks.

Technology Category

Application Category

📝 Abstract
Accurate identification of cat breeds from images is a challenging task due to subtle differences in fur patterns, facial structure, and color. In this paper, we present a deep learning-based approach for classifying cat breeds using a subset of the Oxford-IIIT Pet Dataset, which contains high-resolution images of various domestic breeds. We employed the Global Context Vision Transformer (GCViT) architecture-tiny for cat breed recognition. To improve model generalization, we used extensive data augmentation, including rotation, horizontal flipping, and brightness adjustment. Experimental results show that the GCViT-Tiny model achieved a test accuracy of 92.00% and validation accuracy of 94.54%. These findings highlight the effectiveness of transformer-based architectures for fine-grained image classification tasks. Potential applications include veterinary diagnostics, animal shelter management, and mobile-based breed recognition systems. We also provide a hugging face demo at https://huggingface.co/spaces/bfarhad/cat-breed-classifier.
Problem

Research questions and friction points this paper is trying to address.

fine-grained classification
cat breed recognition
image classification
visual recognition
Innovation

Methods, ideas, or system contributions that make the work stand out.

Global Context Vision Transformer
fine-grained classification
cat breed recognition
data augmentation
Vision Transformer
🔎 Similar Papers
No similar papers found.
M
Mowmita Parvin Hera
Jashore University of Science and Technology, Bangladesh
M
Md. Shahriar Mahmud Kallol
Jashore University of Science and Technology, Bangladesh
S
Shohanur Rahman Nirob
Jashore University of Science and Technology, Bangladesh
M
Md. Badsha Bulbul
Jashore University of Science and Technology, Bangladesh
J
Jubayer Ahmed
Jashore University of Science and Technology, Bangladesh
M
M. Zhourul Islam
Jashore University of Science and Technology, Bangladesh
Hazrat Ali
Hazrat Ali
University of Stirling
Artificial IntellienceGenerative AIMedical AIHealthcare
M
Mohammad Farhad Bulbul
Jashore University of Science and Technology, Bangladesh