Turkish Journal of Electrical Engineering and Computer Sciences




A default loan (also called nonperforming loan) occurs when there is a failure to meet bank conditions and repayment cannot be made in accordance with the terms of the loan which has reached its maturity. In this study, we provide a predictive analysis of the consumer behavior concerning a loan?Äôs first payment default (FPD) using a real dataset of consumer loans with approximately 600,000 records from a bank. We use logistic regression, naive Bayes, support vector machine, and random forest on oversampled and undersampled data to build eight different models to predict FPD loans. A two-class random forest using undersampling yielded more than 86 % on all performance measures: accuracy, precision, recall, and F1-score. The corresponding scores are even as high as 96% for oversampling. However, when tested on the real and balanced dataset, the performance of oversampling deteriorates as generating synthetic data for an extremely imbalanced dataset harms the training procedure of the algorithms. The study also provides an understanding of the reasons for nonperforming loans and helps to manage credit risks more consciously.


achine learning, default loan, first payment default, imbalanced class problem, oversampling, undersampling

First Page


Last Page