π€ AI Summary
This work proposes a novel paradigm for low-light image enhancement termed Controllable Low-light Enhancement (CLE), addressing the ill-posed nature of the task that often leads to brightness bias and reliance on post-processing heuristics such as "gt-mean." For the first time, the problem is formulated as a conditionally controllable task, supported by Light100βa newly introduced continuous multi-illumination datasetβand a CLE-RWKV framework based on state space models. The method leverages noise-disentangled supervision in the HVI color space and a Space-to-Depth architecture to restore local inductive biases while maintaining linear computational complexity. This enables precise brightness control alongside high color fidelity. Extensive evaluations across seven benchmarks demonstrate that the proposed approach significantly reduces dependence on post-processing and achieves high-fidelity, controllable enhancement results.
π Abstract
Low-light image enhancement (LLIE) has traditionally been formulated as a deterministic mapping. However, this paradigm often struggles to account for the ill-posed nature of the task, where unknown ambient conditions and sensor parameters create a multimodal solution space. Consequently, state-of-the-art methods frequently encounter luminance discrepancies between predictions and labels, often necessitating "gt-mean" post-processing to align output luminance for evaluation. To address this fundamental limitation, we propose a transition toward Controllable Low-light Enhancement (CLE), explicitly reformulating the task as a well-posed conditional problem. To this end, we introduce CLE-RWKV, a holistic framework supported by Light100, a new benchmark featuring continuous real-world illumination transitions. To resolve the conflict between luminance control and chromatic fidelity, a noise-decoupled supervision strategy in the HVI color space is employed, effectively separating illumination modulation from texture restoration. Architecturally, to adapt efficient State Space Models (SSMs) for dense prediction, we leverage a Space-to-Depth (S2D) strategy. By folding spatial neighborhoods into channel dimensions, this design allows the model to recover local inductive biases and effectively bridge the "scanning gap" inherent in flattened visual sequences without sacrificing linear complexity. Experiments across seven benchmarks demonstrate that our approach achieves competitive performance and robust controllability, providing a real-world multi-illumination alternative that significantly reduces the reliance on gt-mean post-processing.