Predicting and Explaining Customer Data Sharing in the Open Banking

📅 2025-06-27
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the critical challenge of predicting and interpreting customer data-sharing behavior in open banking environments. Methodologically, it proposes a hybrid data-balancing strategy integrating ADASYN (adaptive synthetic sampling) and NearMiss undersampling—applied for the first time to highly imbalanced sharing-behavior data (N = 3.2 million)—and constructs an XGBoost prediction model coupled with a dual-path interpretability framework combining SHAP (for global feature attribution) and CART (for local, rule-based explanations). Key findings identify mobile transaction frequency and credit card usage behavior as the most influential drivers of data sharing. Experimental results demonstrate high predictive accuracy—91.39% for inbound and 91.53% for outbound sharing predictions—significantly outperforming baseline models. The approach delivers actionable, quantitatively grounded insights into causal determinants of sharing behavior, thereby supporting financial institutions in optimizing data governance, product design, and competitive strategy formulation.

Technology Category

Application Category

📝 Abstract
The emergence of Open Banking represents a significant shift in financial data management, influencing financial institutions' market dynamics and marketing strategies. This increased competition creates opportunities and challenges, as institutions manage data inflow to improve products and services while mitigating data outflow that could aid competitors. This study introduces a framework to predict customers' propensity to share data via Open Banking and interprets this behavior through Explanatory Model Analysis (EMA). Using data from a large Brazilian financial institution with approximately 3.2 million customers, a hybrid data balancing strategy incorporating ADASYN and NEARMISS techniques was employed to address the infrequency of data sharing and enhance the training of XGBoost models. These models accurately predicted customer data sharing, achieving 91.39% accuracy for inflow and 91.53% for outflow. The EMA phase combined the Shapley Additive Explanations (SHAP) method with the Classification and Regression Tree (CART) technique, revealing the most influential features on customer decisions. Key features included the number of transactions and purchases in mobile channels, interactions within these channels, and credit-related features, particularly credit card usage across the national banking system. These results highlight the critical role of mobile engagement and credit in driving customer data-sharing behaviors, providing financial institutions with strategic insights to enhance competitiveness and innovation in the Open Banking environment.
Problem

Research questions and friction points this paper is trying to address.

Predict customer data sharing in Open Banking
Explain factors influencing data sharing decisions
Enhance financial institutions' strategic competitiveness
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hybrid data balancing with ADASYN and NEARMISS
XGBoost models for accurate data sharing prediction
SHAP and CART for explaining customer decisions
J
João B. G. de Brito
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
R
Rodrigo Heldt
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
C
Cleo S. Silveira
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
M
Matthias Bogaert
Ghent University, Ghent, Belgium
G
Guilherme B. Bucco
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
F
Fernando B. Luce
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
J
João L. Becker
Fundação Getúlio Vargas, São Paulo, Brazil
F
Filipe J. Zabala
Federal University of Rio Grande do Sul, Porto Alegre, Brazil
Michel J. Anzanello
Michel J. Anzanello
Professor of Industrial Engineering, Federal University of Rio Grande do Sul
Production planningData miningMultivariate techniques