🤖 AI Summary
AutoML and XAI struggle to balance efficiency and interpretability on large-scale multi-table databases (millions of samples, tens of thousands of variables, hundreds of millions of table records).
Method: We propose an end-to-end, lightweight Bayesian framework that jointly performs automated propositionalization, numerical discretization, and categorical value clustering to enable automatic feature construction and quantitative variable importance estimation in multi-table settings. Crucially, it unifies variable selection and weight learning within a sparse Bayesian inference framework.
Contribution/Results: The system provides both Python API and GUI interfaces and is open-source. Experiments demonstrate sublinear time complexity on datasets with hundreds of millions of records, while achieving high predictive accuracy and strong model interpretability—effectively bridging the gap between scalability and transparency in multi-table AutoML.
📝 Abstract
Khiops is an open source machine learning tool designed for mining large multi-table databases. Khiops is based on a unique Bayesian approach that has attracted academic interest with more than 20 publications on topics such as variable selection, classification, decision trees and co-clustering. It provides a predictive measure of variable importance using discretisation models for numerical data and value clustering for categorical data. The proposed classification/regression model is a naive Bayesian classifier incorporating variable selection and weight learning. In the case of multi-table databases, it provides propositionalisation by automatically constructing aggregates. Khiops is adapted to the analysis of large databases with millions of individuals, tens of thousands of variables and hundreds of millions of records in secondary tables. It is available on many environments, both from a Python library and via a user interface.