Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

📅 2026-03-31

📈 Citations: 0

✨ Influential: 0

career value

231K/year

🤖 AI Summary

This study addresses the challenges of data scarcity and domain shift in mine target classification from side-scan sonar imagery by proposing the first self-supervised learning framework tailored to this modality. Leveraging only 1,170 unlabeled sonar images for pretraining, the approach integrates synthetic data augmentation, a lightweight ViT-Tiny backbone, and a regularized self-supervised loss (SIGReg). It achieves F1 scores of 0.935 and 0.820 on binary and ternary classification tasks, respectively—significantly outperforming fine-tuned DINOv3 while using only one-quarter of its parameters. The findings demonstrate that a carefully designed, small-scale in-domain self-supervised method can surpass large-scale general-purpose vision models, and that applying additional in-domain self-supervision to strong pretrained models may actually degrade performance.

Technology Category

Application Category

📝 Abstract

Side-scan sonar (SSS) mine classification is a challenging maritime vision problem characterized by extreme data scarcity and a large domain gap from natural images. While self-supervised learning (SSL) and general-purpose vision foundation models have shown strong performance in general vision and several specialized domains, their use in SSS remains largely unexplored. We present Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, using SIGReg, a regularization-based SSL loss, to pretrain on only 1,170 unlabeled sonar images. In the binary mine vs. non-mine setting, Mine-JEPA achieves an F1 score of 0.935, outperforming fine-tuned DINOv3 (0.922), a foundation model pretrained on 1.7B images. For 3-class mine-like object classification, Mine-JEPA reaches 0.820 with synthetic data augmentation, again outperforming fine-tuned DINOv3 (0.810). We further observe that applying in-domain SSL to foundation models degrades performance by 10--13 percentage points, suggesting that stronger pretrained models do not always benefit from additional domain adaptation. In addition, Mine-JEPA with a compact ViT-Tiny backbone achieves competitive performance while using 4x fewer parameters than DINOv3. These results suggest that carefully designed in-domain self-supervised learning is a viable alternative to much larger foundation models in data-scarce maritime sonar imagery.

Problem

Research questions and friction points this paper is trying to address.

side-scan sonar

mine classification

data scarcity

domain gap

self-supervised learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

in-domain self-supervised learning

side-scan sonar

mine classification