SkinFlow: Efficient Information Transmission for Open Dermatological Diagnosis via Dynamic Visual Encoding and Staged RL

📅 2026-01-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limited diagnostic accuracy of general-purpose large vision-language models in dermatology, which stems from their diffuse attention mechanisms struggling to distinguish subtle lesions from background noise. To overcome this, the study formulates skin disease diagnosis as an optimization problem of visual information transmission efficiency and proposes a Dynamic Visual Encoder with virtual width expansion (DVE) that effectively unfolds complex pathological manifolds without increasing parameter count. A two-stage reinforcement learning mechanism is further introduced to progressively align explicit medical descriptions with implicit diagnostic textures. Evaluated under a clinically safety-oriented protocol on Fitzpatrick17k, the proposed 7B model achieves a 12.06% improvement in Top-1 accuracy and a 28.57% gain in Top-6 accuracy, outperforming much larger models such as Qwen3VL-235B and GPT-5.2.

Technology Category

Application Category

📝 Abstract
General-purpose Large Vision-Language Models (LVLMs), despite their massive scale, often falter in dermatology due to"diffuse attention"- the inability to disentangle subtle pathological lesions from background noise. In this paper, we challenge the assumption that parameter scaling is the only path to medical precision. We introduce SkinFlow, a framework that treats diagnosis as an optimization of visual information transmission efficiency. Our approach utilizes a Virtual-Width Dynamic Vision Encoder (DVE) to"unfold"complex pathological manifolds without physical parameter expansion, coupled with a two-stage Reinforcement Learning strategy. This strategy sequentially aligns explicit medical descriptions (Stage I) and reconstructs implicit diagnostic textures (Stage II) within a constrained semantic space. Furthermore, we propose a clinically grounded evaluation protocol that prioritizes diagnostic safety and hierarchical relevance over rigid label matching. Empirical results are compelling: our 7B model establishes a new state-of-the-art on the Fitzpatrick17k benchmark, achieving a +12.06% gain in Top-1 accuracy and a +28.57% boost in Top-6 accuracy over the massive general-purpose models (e.g., Qwen3VL-235B and GPT-5.2). These findings demonstrate that optimizing geometric capacity and information flow yields superior diagnostic reasoning compared to raw parameter scaling.
Problem

Research questions and friction points this paper is trying to address.

diffuse attention
dermatological diagnosis
visual information transmission
pathological lesions
medical precision
Innovation

Methods, ideas, or system contributions that make the work stand out.

Dynamic Vision Encoder
Staged Reinforcement Learning
Information Transmission Efficiency
Pathological Manifold Unfolding
Clinically Grounded Evaluation
L
Lijun Liu
Baichuan Inc.
L
Linwei Chen
Baichuan Inc.
Z
Zhishou Zhang
Baichuan Inc.
M
Meng Tian
Baichuan Inc.
H
Hengfu Cui
Baichuan Inc.
R
Ruiyang Li
Baichuan Inc.
Zhaocheng Liu
Zhaocheng Liu
Baichuan.Inc
Applied Machine Learning
Q
Qiang Ju
Baichuan Inc.
Q
Qianxi Li
Department of Dermatology, Peking University First Hospital; Beijing Key Laboratory of Molecular Diagnosis on Dermatoses; National Clinical Research Center for Skin and Sexually Transmitted Diseases; NMPA Key Laboratory for Quality Control and Evaluation of Cosmetics
Hong-Yu Zhou
Hong-Yu Zhou
Assistant Professor of Biomedical Engineering, Tsinghua University. Past: Harvard Medical School.
AI for HealthcareAI for MedicineBiomedical AI