Shared Neural Space: Unified Precomputed Feature Encoding for Multi-Task and Cross Domain Vision

📅 2025-09-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing AI models are typically task-specific, necessitating distinct latent space mappings for diverse imaging and vision tasks—e.g., denoising, demosaicing, depth estimation, and semantic segmentation—resulting in poor cross-task generalization and high computational redundancy. To address this, we propose a “Universal Neural Space” framework built upon a lightweight CNN encoder-decoder architecture. It precomputes and shares transformation-aware feature representations across tasks and domains, eliminating the conventional isolation of task-specific latent spaces. This unified representation enables joint multi-task inference, preserving accuracy while substantially reducing model parameters and FLOPs, enhancing cross-domain generalization, and demonstrating feasibility for deployment on resource-constrained hardware. Our core contribution is the first construction of a transferable, modular, and lightweight unified feature space for multi-task learning—enabling shared semantics without compromising task-specific performance.

Technology Category

Application Category

📝 Abstract
The majority of AI models in imaging and vision are customized to perform on specific high-precision task. However, this strategy is inefficient for applications with a series of modular tasks, since each requires a mapping into a disparate latent domain. To address this inefficiency, we proposed a universal Neural Space (NS), where an encoder-decoder framework pre-computes features across vision and imaging tasks. Our encoder learns transformation aware, generalizable representations, which enable multiple downstream AI modules to share the same feature space. This architecture reduces redundancy, improves generalization across domain shift, and establishes a foundation for effecient multi-task vision pipelines. Furthermore, as opposed to larger transformer backbones, our backbone is lightweight and CNN-based, allowing for wider across hardware. We furthur demonstrate that imaging and vision modules, such as demosaicing, denoising, depth estimation and semantic segmentation can be performed efficiently in the NS.
Problem

Research questions and friction points this paper is trying to address.

Eliminating redundant feature encoding for multiple vision tasks
Enabling cross-domain generalization in computer vision pipelines
Creating unified feature space for modular multi-task applications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Precomputed encoder-decoder framework for multi-task vision
Shared neural space enabling cross-domain feature generalization
Lightweight CNN backbone for efficient hardware deployment
🔎 Similar Papers
No similar papers found.
J
Jing Li
MPI Lab, Samsung Research America, 6105 Tennyson Parkway, TX, USA
O
Oskar Bartosz
Samsung Research Poland
Chengyu Wang
Chengyu Wang
Alibaba Group
Natural Language ProcessingLarge Language ModelMulti-modal Learning
M
Michal Wnuczynski
Samsung Research Poland
Dilshan Godaliyadda
Dilshan Godaliyadda
Samsung Research America
image processingcomputational imagingcomputer visionmachine learningsignal processing
M
Michael Polley
MPI Lab, Samsung Research America, 6105 Tennyson Parkway, TX, USA