Hostname: page-component-8448b6f56d-qsmjn Total loading time: 0 Render date: 2024-04-23T08:21:35.785Z Has data issue: false hasContentIssue false

Evaluation metrics and dimensional reduction for binary classification algorithms: a case study on bankruptcy prediction

Published online by Cambridge University Press:  14 January 2022

María E. Pérez-Pons
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Javier Parra-Dominguez
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain
Guillermo Hernández
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain
Enrique Herrera-Viedma
Affiliation:
University of Granada, Colegio Máximo de Cartuja, Campus Universitario de Cartuja C.P. 18071 Granada, Spain
Juan M. Corchado
Affiliation:
BISITE Research Group, University of Salamanca. Edificio I+D+i, Calle Espejo 2, 37007, Salamanca, Spain Air Institute, IoT Digital Innovation Hub, Carbajosa de la Sagrada, 37188. Salamanca, Spain Pusat Komputeran dan Informatik, Universiti Malaysia Kelantan, Karung Berkunci 36, Pengkaan Chepa, 16100 Kota Bharu, Kelantan, Malaysia Department of Electronics, Information and Communication, Faculty of Engineering, Osaka Institute of Technology, 535-8585 Osaka, Japan

Abstract

This paper presents a methodology that permits to automate binary classification using the minimum possible number of attributes. In this methodology, the success of the binary prediction does not lie in the accuracy of an algorithm but in the evaluation metrics, which give information about the goodness of fit; which is an important factor when the data batch is unbalanced. The proposed methodology assesses the possible biases in identifying one algorithm as the best performer when considering the goodness of fit of an algorithm through evaluation metrics. The dimension of data has been reduced through the cumulative explained variance. Then, the performance of six machine learning classification models has been compared through Matthew correlation coefficient (MCC), area under curve – receiver operating characteristic (ROC-AUC), and area under curve – precision-recall (AUC-PR). The results show graphically and numerically how the evaluation metrics interfere with the most optimal outcome of an algorithm. The algorithms with the best performance in terms of evaluation metrics have been random forest and gradient boosting. In the imbalanced datasets, MCC has provided better prediction results than ROC-AUC or AUC-PR. The proposed methodology is adapted to the case of bankruptcy prediction.

Type
Research Article
Copyright
© The Author(s), 2022. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Alaka, Hafiz A, Oyedele, Lukumon O, Owolabi, Hakeem A, Vikas, Kumar, Ajayi, Saheed O, Akinade, Olugbenga O, and Muhammad, Bilal. Systematic review of bankruptcy prediction models: Towards a framework for tool selection. Expert Systems with Applications, 94: 164184, 2018.CrossRefGoogle Scholar
Altman, Edward I. Financial ratios, discriminant analysis and the prediction of corporate bankruptcy. The journal of finance, 23 (4): 589609, 1968.CrossRefGoogle Scholar
Flavio, Barboza, Herbert, Kimura, and Edward, Altman. Machine learning models and bankruptcy prediction. Expert Systems with Applications, 83: 405417, 2017.Google Scholar
Beaver, William H. Financial ratios as predictors of failure. Journal of accounting research, pages 71111, 1966.CrossRefGoogle Scholar
Beaver, William H, McNichols, Maureen F, and Jung-Wu, Rhie. Have financial statements become less informative? evidence from the ability of financial ratios to predict bankruptcy. Review of Accounting studies, 10 (1): 93122, 2005.CrossRefGoogle Scholar
Bellovary, Jodi L, Giacomino, Don E, and Akers, Michael D. A review of bankruptcy prediction studies: 1930 to present. Journal of Financial education, pages 1–42, 2007.Google Scholar
Girish, Chandrashekar and Ferat, Sahin. A survey on feature selection methods. Computers & Electrical Engineering, 40 (1): 1628, 2014.Google Scholar
Davide, Chicco and Giuseppe, Jurman. The advantages of the matthews correlation coefficient (mcc) over f1 score and accuracy in binary classification evaluation. BMC genomics, 21 (1): 113, 2020.Google Scholar
Jesse, Davis and Mark, Goadrich. The relationship between precision-recall and roc curves. In Proceedings of the 23rd international conference on Machine learning, pages 233–240, 2006.CrossRefGoogle Scholar
Abe De, Jong, Rezaul, Kabir, and Thuy, Thu Nguyen. Capital structure around the world: The roles of firm-and country-specific determinants. Journal of Banking & Finance, 32 (9): 19541969, 2008.Google Scholar
Sarojini Devi, S and Radhika, Y. A survey on machine learning and statistical techniques in bankruptcy prediction. International Journal of Machine Learning and Computing, 8 (2): 133–139, 2018.CrossRefGoogle Scholar
Emil, Eirola, Andrey, Gritsenko, Anton, Akusok, Kaj-Mikael, BjÖrk, Yoan, Miche, DuŠan, Sovilj, Rui, Nian, Bo, He, and Amaury, Lendasse. Extreme learning machines for multiclass classification: refining predictions with gaussian mixture models. In International Work-Conference on Artificial Neural Networks, pages 153–164. Springer, 2015.CrossRefGoogle Scholar
Daryush, Foroghi, Amirhassan, Monadjemi, et al. Applying decision tree to predict bankruptcy. In 2011 IEEE International Conference on Computer Science and Automation Engineering, volume 4, pages 165–169. IEEE, 2011.Google Scholar
Haibo, He and Edwardo, A Garcia. Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21 (9): 1263–1284, 2009.CrossRefGoogle Scholar
Hillegeist, Stephen A, Keating, Elizabeth K, Donald P Cram, and Kyle G Lundstedt. Assessing the probability of bankruptcy. Review of accounting studies, 9 (1): 534, 2004.CrossRefGoogle Scholar
Tadaaki, Hosaka. Bankruptcy prediction using imaged financial ratios and convolutional neural networks. Expert systems with applications, 117: 287299, 2019.Google Scholar
Chih-Wei, Hsu, Chih-Chung, Chang, Chih-Jen, Lin, et al. A practical guide to support vector classification, 2003.Google Scholar
Win-Bin, Huang, Junting, Liu, Haodong, Bai, and Pengyi, Zhang. Value assessment of companies by using an enterprise value assessment system based on their public transfer specification. Information Processing & Management, 57 (5): 102254, 2020.Google Scholar
Sadegh Bafandeh, Imandoust and Mohammad, Bolandraftar. Application of k-nearest neighbor (knn) approach for predicting economic events: Theoretical background. International Journal of Engineering Research and Applications, 3 (5): 605610, 2013.Google Scholar
Utkarsh Mahadeo Khaire and Dhanalakshmi, R. Stability of feature selection algorithm: A review. Journal of King Saud University-Computer and Information Sciences, 2019.Google Scholar
Hyeongjun, Kim, Hoon, Cho, and Doojin, Ryu. Corporate default predictions using machine learning: Literature review. Sustainability, 12 (16): 6325, 2020.Google Scholar
Emrehan Kutlug, Sahin, Cengizhan, Ipbuker, and Taskin, Kavzoglu. Investigation of automatic feature weighting methods (fisher, chi-square and relief-f) for landslide susceptibility mapping. Geocarto international, 32 (9): 956977, 2017.CrossRefGoogle Scholar
Larry, Li and Silvia, Z Islam. Firm and industry specific determinants of capital structure: Evidence from the australian market. International Review of Economics & Finance, 59: 425–437, 2019.Google Scholar
Piero, Montebruno, Bennett, Robert J, Harry, Smith, and Carry, Van Lieshout. Machine learning classification of entrepreneurs in british historical census data. Information Processing & Management, 57 (3): 102210, 2020.Google Scholar
OECD. Country statistical profile: Spain 2020. OECD ilibrary, 2018. URL https://www.oecd-ilibrary.org/.Google Scholar
Ohlson, James A. Financial ratios and the probabilistic prediction of bankruptcy. Journal of accounting research, pages 109131, 1980.CrossRefGoogle Scholar
Olson, David L, Dursun, Delen, and Yanyan, Meng. Comparative analysis of data mining methods for bankruptcy prediction. Decision Support Systems, 52 (2): 464473, 2012.CrossRefGoogle Scholar
Onnela, J-P, Anirban, Chakraborti, Kimmo, Kaski, and Janos, Kertesz. Dynamic asset trees and black monday. Physica A: Statistical Mechanics and its Applications, 324 (1-2): 247252, 2003.CrossRefGoogle Scholar
Yi, Qu, Pei, Quan, Minglong, Lei, and Yong, Shi. Review of bankruptcy prediction using machine learning and deep learning techniques. Procedia Computer Science, 162: 895899, 2019.Google Scholar
Mandeep Kaur, Saggi and Sushma, Jain. A survey towards an integration of big data analytics to big insights for value-creation. Information Processing & Management, 54 (5): 758790, 2018.Google Scholar
Takaya, Saito and Marc, Rehmsmeier. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PloS one, 10 (3), 2015.Google Scholar
Sharma, M and Monali, Mavani. Development of predictive model in education system: using nave bayes classifier. In Proceedings of the International Conference & Workshop on Emerging Trends in Technology, pages 185–186, 2011.CrossRefGoogle Scholar
Tyler, Shumway. Forecasting bankruptcy more accurately: A simple hazard model. The journal of business, 74 (1): 101124, 2001.Google Scholar
Saúl Solorio, Fernández, J Ariel Carrasco, Ochoa, and José Fco Martnez, Trinidad. A review of unsupervised feature selection methods. Artificial Intelligence Review, 53 (2): 907–948, 2020.CrossRefGoogle Scholar
David, Veganzones and Eric, Séverin. An investigation of bankruptcy prediction in imbalanced datasets. Decision Support Systems, 112: 111124, 2018.Google Scholar
Robert, Wade and Frank, Veneroso. The asian crisis: the high debt model versus the wall street-treasury-imf complex. New left review, pages 324, 1998.Google Scholar
Nanxi, Wang et al. Bankruptcy prediction using machine learning. Journal of Mathematical Finance, 7 (04): 908, 2017.Google Scholar
Guoqiu, Wen, Xianxian, Li, Yonghua, Zhu, Linjun, Chen, Qimin, Luo, and Malong, Tan. One-step spectral rotation clustering for imbalanced high-dimensional data. Information Processing & Management, 58 (1): 102388, 2021.CrossRefGoogle Scholar
Feng, Yang and KZ, Mao. Robust feature selection for microarray data based on multicriterion fusion. IEEE/ACM Transactions on Computational Biology and Bioinformatics, 8 (4): 1080–1092, 2010.CrossRefGoogle Scholar
Wenhao, Zhang et al. Machine learning approaches to predicting company bankruptcy. Journal of Financial Risk Management, 6 (04): 364, 2017.Google Scholar
Maciej, Zieba, Sebastian K Tomczak, and Jakub M Tomczak. Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction. Expert systems with applications, 58: 93–101, 2016.CrossRefGoogle Scholar