Mine-JEPA: In-Domain Self-Supervised Learning for Mine-Like Object Classification in Side-Scan Sonar

📅 2026-03-31
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the challenges of data scarcity and domain shift in mine target classification from side-scan sonar imagery by proposing the first self-supervised learning framework tailored to this modality. Leveraging only 1,170 unlabeled sonar images for pretraining, the approach integrates synthetic data augmentation, a lightweight ViT-Tiny backbone, and a regularized self-supervised loss (SIGReg). It achieves F1 scores of 0.935 and 0.820 on binary and ternary classification tasks, respectively—significantly outperforming fine-tuned DINOv3 while using only one-quarter of its parameters. The findings demonstrate that a carefully designed, small-scale in-domain self-supervised method can surpass large-scale general-purpose vision models, and that applying additional in-domain self-supervision to strong pretrained models may actually degrade performance.
📝 Abstract
Side-scan sonar (SSS) mine classification is a challenging maritime vision problem characterized by extreme data scarcity and a large domain gap from natural images. While self-supervised learning (SSL) and general-purpose vision foundation models have shown strong performance in general vision and several specialized domains, their use in SSS remains largely unexplored. We present Mine-JEPA, the first in-domain SSL pipeline for SSS mine classification, using SIGReg, a regularization-based SSL loss, to pretrain on only 1,170 unlabeled sonar images. In the binary mine vs. non-mine setting, Mine-JEPA achieves an F1 score of 0.935, outperforming fine-tuned DINOv3 (0.922), a foundation model pretrained on 1.7B images. For 3-class mine-like object classification, Mine-JEPA reaches 0.820 with synthetic data augmentation, again outperforming fine-tuned DINOv3 (0.810). We further observe that applying in-domain SSL to foundation models degrades performance by 10--13 percentage points, suggesting that stronger pretrained models do not always benefit from additional domain adaptation. In addition, Mine-JEPA with a compact ViT-Tiny backbone achieves competitive performance while using 4x fewer parameters than DINOv3. These results suggest that carefully designed in-domain self-supervised learning is a viable alternative to much larger foundation models in data-scarce maritime sonar imagery.
Problem

Research questions and friction points this paper is trying to address.

side-scan sonar
mine classification
data scarcity
domain gap
self-supervised learning
Innovation

Methods, ideas, or system contributions that make the work stand out.

in-domain self-supervised learning
side-scan sonar
mine classification
SIGReg
foundation model adaptation
🔎 Similar Papers
No similar papers found.
Taeyoun Kwon
Taeyoun Kwon
Seoul National University
Youngwon Choi
Youngwon Choi
MAUM AI Inc.
Conversational AI
H
Hyeonyu Kim
Maum AI Inc.
M
Myeongkyun Cho
Maum AI Inc.
J
Junhyeok Choi
Maum AI Inc.
M
Moon Hwan Kim
Maum AI Inc.