🤖 AI Summary
This work addresses the absence of automated machine learning (AutoML) solutions in natural language understanding (NLU) that simultaneously support data quality diagnosis and out-of-distribution (OOD) detection without manual intervention. To this end, we propose an open-source AutoML library tailored for text classification and named entity recognition tasks, featuring a novel data-aware training mechanism that automatically optimizes model configurations without requiring user-specified hyperparameters. The framework integrates features derived from large language models and incorporates configurable modules for OOD detection and data quality analysis, all accessible via a low-code API. Experimental results demonstrate that our approach significantly enhances model robustness and generalization while maintaining high usability.
📝 Abstract
OpenAutoNLU is an open-source automated machine learning library for natural language understanding (NLU) tasks, covering both text classification and named entity recognition (NER). Unlike existing solutions, we introduce data-aware training regime selection that requires no manual configuration from the user. The library also provides integrated data quality diagnostics, configurable out-of-distribution (OOD) detection, and large language model (LLM) features, all within a minimal lowcode API. The demo app is accessible here https://openautonlu.dev.