PrismSSL: One Interface, Many Modalities; A Single-Interface Library for Multimodal Self-Supervised Learning

📅 2025-11-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address fragmentation, complex configuration, and poor extensibility in multimodal self-supervised learning (SSL) frameworks, this paper introduces UniSSL: a unified, modular PyTorch framework. UniSSL abstracts audio, visual, graph-structured, and cross-modal SSL tasks behind a single, cohesive interface, enabling minimal YAML-based configuration and one-command training. It integrates state-of-the-art components—including Hugging Face Transformers, distributed training, LoRA fine-tuning, Optuna for hyperparameter optimization, and W&B/Flask-based visualization dashboards. The framework provides standardized data recipes, reproducible benchmarking experiments, and dynamic embedding analysis tools. UniSSL significantly lowers the barrier to entry for multimodal SSL research while balancing usability and extensibility. It is open-sourced on PyPI and designed for rapid, plugin-style extension to support novel modalities and algorithms.

Technology Category

Application Category

📝 Abstract
We present PrismSSL, a Python library that unifies state-of-the-art self-supervised learning (SSL) methods across audio, vision, graphs, and cross-modal settings in a single, modular codebase. The goal of the demo is to show how researchers and practitioners can: (i) install, configure, and run pretext training with a few lines of code; (ii) reproduce compact benchmarks; and (iii) extend the framework with new modalities or methods through clean trainer and dataset abstractions. PrismSSL is packaged on PyPI, released under the MIT license, integrates tightly with HuggingFace Transformers, and provides quality-of-life features such as distributed training in PyTorch, Optuna-based hyperparameter search, LoRA fine-tuning for Transformer backbones, animated embedding visualizations for sanity checks, Weights & Biases logging, and colorful, structured terminal logs for improved usability and clarity. In addition, PrismSSL offers a graphical dashboard - built with Flask and standard web technologies - that enables users to configure and launch training pipelines with minimal coding. The artifact (code and data recipes) will be publicly available and reproducible.
Problem

Research questions and friction points this paper is trying to address.

Unifying multimodal self-supervised learning methods in one library
Simplifying installation and execution of SSL training pipelines
Enabling extensibility to new modalities and methods
Innovation

Methods, ideas, or system contributions that make the work stand out.

Unified multimodal self-supervised learning library
Modular codebase supporting audio vision graphs
Graphical dashboard for training configuration
🔎 Similar Papers
No similar papers found.
M
Melika Shirian
Department of Computer Engineering, University of Isfahan, Isfahan, Iran
K
Kianoosh Vadaei
Department of Computer Engineering, University of Isfahan, Isfahan, Iran
K
Kian Majlessi
Department of Computer Engineering, University of Isfahan, Isfahan, Iran
A
Audrina Ebrahimi
Department of Computer Engineering, University of Texas at Dallas, Isfahan, Iran
A
Arshia Hemmat
Department of Computer Science, University of Oxford, Oxford, UK
Peyman Adibi
Peyman Adibi
University of Isfahan
Machine LearningPattern RecognitionComputer VisionComputational IntelligenceImage Processing
Hossein Karshenas
Hossein Karshenas
Department of Artificial Intelligence, University of Isfahan, Isfahan, Iran