🤖 AI Summary
This study addresses the critical challenge of predicting and interpreting customer data-sharing behavior in open banking environments. Methodologically, it proposes a hybrid data-balancing strategy integrating ADASYN (adaptive synthetic sampling) and NearMiss undersampling—applied for the first time to highly imbalanced sharing-behavior data (N = 3.2 million)—and constructs an XGBoost prediction model coupled with a dual-path interpretability framework combining SHAP (for global feature attribution) and CART (for local, rule-based explanations). Key findings identify mobile transaction frequency and credit card usage behavior as the most influential drivers of data sharing. Experimental results demonstrate high predictive accuracy—91.39% for inbound and 91.53% for outbound sharing predictions—significantly outperforming baseline models. The approach delivers actionable, quantitatively grounded insights into causal determinants of sharing behavior, thereby supporting financial institutions in optimizing data governance, product design, and competitive strategy formulation.
📝 Abstract
The emergence of Open Banking represents a significant shift in financial data management, influencing financial institutions' market dynamics and marketing strategies. This increased competition creates opportunities and challenges, as institutions manage data inflow to improve products and services while mitigating data outflow that could aid competitors. This study introduces a framework to predict customers' propensity to share data via Open Banking and interprets this behavior through Explanatory Model Analysis (EMA). Using data from a large Brazilian financial institution with approximately 3.2 million customers, a hybrid data balancing strategy incorporating ADASYN and NEARMISS techniques was employed to address the infrequency of data sharing and enhance the training of XGBoost models. These models accurately predicted customer data sharing, achieving 91.39% accuracy for inflow and 91.53% for outflow. The EMA phase combined the Shapley Additive Explanations (SHAP) method with the Classification and Regression Tree (CART) technique, revealing the most influential features on customer decisions. Key features included the number of transactions and purchases in mobile channels, interactions within these channels, and credit-related features, particularly credit card usage across the national banking system. These results highlight the critical role of mobile engagement and credit in driving customer data-sharing behaviors, providing financial institutions with strategic insights to enhance competitiveness and innovation in the Open Banking environment.