The Aloe Family Recipe for Open and Specialized Healthcare LLMs

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-source medical large language models (LLMs) suffer from insufficient safety guarantees, domain-specific expertise, and clinical utility. Method: This paper proposes an end-to-end optimization framework built upon Llama 3.1 or Qwen 2.5 base models, integrating synthetic chain-of-thought data augmentation, direct preference optimization (DPO) for alignment, and retrieval-augmented generation (RAG). A four-dimensional evaluation suite—encompassing closed/open-ended QA, safety, human expert assessment, and jailbreak robustness—is introduced. Contribution/Results: We establish a novel open-source release paradigm that jointly prioritizes high clinical efficacy and strong jailbreak resistance, accompanied by a dedicated medical risk assessment report. The released Aloe Beta model matches top proprietary models across multiple medical benchmarks, achieves higher physician preference scores, exhibits significantly reduced bias and toxicity, and demonstrates robustness against unseen jailbreak attacks—all under a permissive open-source license.

Technology Category

Application Category

📝 Abstract
Purpose: With advancements in Large Language Models (LLMs) for healthcare, the need arises for competitive open-source models to protect the public interest. This work contributes to the field of open medical LLMs by optimizing key stages of data preprocessing and training, while showing how to improve model safety (through DPO) and efficacy (through RAG). The evaluation methodology used, which includes four different types of tests, defines a new standard for the field. The resultant models, shown to be competitive with the best private alternatives, are released with a permisive license. Methods: Building on top of strong base models like Llama 3.1 and Qwen 2.5, Aloe Beta uses a custom dataset to enhance public data with synthetic Chain of Thought examples. The models undergo alignment with Direct Preference Optimization, emphasizing ethical and policy-aligned performance in the presence of jailbreaking attacks. Evaluation includes close-ended, open-ended, safety and human assessments, to maximize the reliability of results. Results: Recommendations are made across the entire pipeline, backed by the solid performance of the Aloe Family. These models deliver competitive performance across healthcare benchmarks and medical fields, and are often preferred by healthcare professionals. On bias and toxicity, the Aloe Beta models significantly improve safety, showing resilience to unseen jailbreaking attacks. For a responsible release, a detailed risk assessment specific to healthcare is attached to the Aloe Family models. Conclusion: The Aloe Beta models, and the recipe that leads to them, are a significant contribution to the open-source medical LLM field, offering top-of-the-line performance while maintaining high ethical requirements. This work sets a new standard for developing and reporting aligned LLMs in healthcare.
Problem

Research questions and friction points this paper is trying to address.

Developing open-source healthcare LLMs to protect public interest
Enhancing model safety and efficacy via DPO and RAG
Setting new evaluation standards for medical LLM performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Optimizes data preprocessing and training stages
Enhances model safety via DPO and efficacy via RAG
Uses synthetic Chain of Thought examples in dataset
🔎 Similar Papers
D
Dario Garcia-Gasulla
Barcelona Supercomputing Center (BSC-CNS), Spain
J
Jordi Bayarri-Planas
Barcelona Supercomputing Center (BSC-CNS), Spain
A
Ashwin Kumar Gururajan
Barcelona Supercomputing Center (BSC-CNS), Spain
E
Enrique Lopez-Cuena
Barcelona Supercomputing Center (BSC-CNS), Spain
A
Adrián Tormos
Barcelona Supercomputing Center (BSC-CNS), Spain
Daniel Hinjos
Daniel Hinjos
Research Engineer, Barcelona Supercomputing Center
Artificial IntelligenceDeep LearningInterpretabilityBioinformatics
P
Pablo Bernabeu Perez
Barcelona Supercomputing Center (BSC-CNS), Spain
Anna Arias-Duart
Anna Arias-Duart
Barcelona Supercomputing Center (BSC)
Artificial Intelligence
P
Pablo A. Martin-Torres
Barcelona Supercomputing Center (BSC-CNS), Spain
M
Marta Gonzalez-Mallo
Barcelona Supercomputing Center (BSC-CNS), Spain
S
S. Álvarez-Napagao
Universitat Politècnica de Catalunya - Barcelona Tech (UPC), Spain; Barcelona Supercomputing Center (BSC-CNS), Spain
E
Eduard Ayguad'e-Parra
Universitat Politècnica de Catalunya - Barcelona Tech (UPC), Spain; Barcelona Supercomputing Center (BSC-CNS), Spain
U
Ulises Cort'es
Universitat Politècnica de Catalunya - Barcelona Tech (UPC), Spain; Barcelona Supercomputing Center (BSC-CNS), Spain