Use of classification techniques in a dataset on financial inclusion: a study based on Latin American countries

Authors

  • Pâmela Rodrigues Venturini de Souza Universidade Tecnológica Federal do Paraná - Campus Londrina
  • Bruno Gigioli Tomazi Universidade Tecnológica Federal do Paraná - Campus Londrina
  • Bruno Samways dos Santos Universidade Tecnológica Federal do Paraná - Campus Londrina

DOI:

https://doi.org/10.47456/bjpe.v8i1.37019

Keywords:

Data mining, Classification, Financial inclusion, Latin America

Abstract

A inclusão financeira é importante para reduzir a pobreza e proporcionar um crescimento econômico inclusivo, principalmente comparando grupos com grande desigualdade social. Este artigo utilizou a pesquisa Global Financial Inclusion (Global Findex) da World Bank Group para comparar técnicas de aprendizado de máquina na classificação de homens e mulheres quanto ao uso de serviços financeiros. Para isso, utilizou-se os classificadores Árvore de decisão, -vizinhos mais próximos, Naïve Bayes e Floresta randômica, e avaliadas as métricas de acurácia, precisão, sensibilidade, f1-score e área sob a curva Receiver Operating Characteristic (ROC). Verificou-se que todas as técnicas (exceto por Naïve Bayes) obtiveram uma acurácia próxima a 70%, sensibilidade próxima a 88% e precisão acima dos 72% na maioria dos parâmetros investigados. Quanto à área sob a curva ROC, a Floresta randômica atingiu 0,77, superando as outras técnicas nesta avaliação.

Downloads

Download data is not yet available.

Author Biographies

Pâmela Rodrigues Venturini de Souza, Universidade Tecnológica Federal do Paraná - Campus Londrina

He has high school-secondary education at the FEDERAL INSTITUTE OF EDUCATION, SCIENCE AND TECHNOLOGY OF PARANA (2016). She is currently an Intern at Sonoco do Brasil. He has experience in the field of Production Engineering. (Text automatically generated by the CVLattes application)

Bruno Gigioli Tomazi, Universidade Tecnológica Federal do Paraná - Campus Londrina

Graduate in Production Engineering

Bruno Samways dos Santos, Universidade Tecnológica Federal do Paraná - Campus Londrina

Graduated in Production Engineering with emphasis on Control and Automation from the Federal Technological University of Paraná, Campus Ponta Grossa (UTFPR-PG), Master in Production Engineering from the same university and PhD in Production and Systems Engineering from the Pontifical Catholic University of Paraná ( PUCPR). He is a full professor at the Department of Production Engineering at the Universidade Tecnológica Federal do Paraná do Campus Londrina (UTFPR - LD) and holds the position of Substitute Coordinator of the Production Engineering Course. He is a lead researcher at the Data Mining and Optimization Research Group (GPOMD), working in the field of data mining research, with an emphasis on classification and clustering problems in several areas. He is interested in optimization models, machine learning techniques for various applications, such as education, public health, services and industries. (Text provided by the author)

References

Abdul Razak, A., & Asutay, M. (2022). Financial inclusion and economic well-being: Evidence from Islamic Pawnbroking (Ar-Rahn) in Malaysia. Research in International Business and Finance, 59, 101557. https://doi.org/10.1016/j.ribaf.2021.101557

Aggarwal, C. C. (2015). Data Mining. In Data Mining. Springer International Publishing. https://doi.org/10.1007/978-3-319-14142-8

Almeida, R. C. de, & Faceroli, S. T. (2014). Análise comparativa das técnicas KNN e rede neural MLP na classificação de padrões mioelétricos. Anais Do XXIV Congresso Brasileiro de Engenharia Biomédica.

Amaral, F. (2016). Aprenda Mineração de Dados: Teoria e Prática (1 ed.). Alta Books.

Berrar, D. (2018). Bayes’ Theorem and Naive Bayes Classifier. Encyclopedia of Bioinformatics and Computational Biology, 1, 403-412. https://doi.org/10.1016/b978-0-12-809633-8.20473-1

Bramer, M. (2016). Principles of Data Mining (3rd ed.). Springer London. https://doi.org/10.1007/978-1-4471-7307-6

Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1007/9781441993267_5

Camilo, C. O., & Silva, J. C., da. (2009). Mineração de Dados: Conceitos, Tarefas, Métodos e Ferramentas. Recuperado de https://rozero.webcindario.com/disciplinas/fbmg/dm/RT-INF_001-09.pdf

Dogan, A. & Birant, D. (2021). Machine learning and data mining in manufacturing. Expert Systems with Applications, 166, 114060. https://doi.org/10.1016/j.eswa.2020.114060

Fayyad, U., Piatetsky-Shapiro, G., & Smyth, P. (1996). From Data Mining to Knowledge Discovery in Databases. AI Magazine, 17(3), 37-53. https://doi.org/10.1609/aimag.v17i3.1230

Fenerich, A., Steiner, M. T. A., Steiner Neto, P. J., Tochetto, E., Tsutsumi, D., Assef, F. M., & Dos Santos, B. S. (2020). Use of machine learning techniques in bank credit risk analysis. Revista Internacional de Metodos Numericos Para Calculo y Diseno En Ingenieria, 36(3), 1-15. https://doi.org/10.23967/J.RIMNI.2020.08.003

Frey, B. B. (2018). Phi Correlation Coefficient. In The SAGE Encyclopedia of Educational Research, Measurement, and Evaluation. SAGE. https://doi.org/10.4135/9781506326139

Géron, A. (2019). Mãos à Obra: Aprendizado de Máuqina com Scikit-Learn & TensorFlow (1 ed.). Alta Books.

Goldschmidt, R., Passos, E., & Bezerra, E. (2015). Data Mining: Conceitos, técnicas, algoritmos, orientações e aplicações (2a ed.). Elsevier.

Gómez-Flores, W., Garza-Saldaña, J. J., & Varela-Fuentes, S. E. (2019). Detection of Huanglongbing disease based on intensity-invariant texture analysis of images in the visible spectrum. Computers and Electronics in Agriculture, 162(2018), 825-835. https://doi.org/10.1016/j.compag.2019.05.032

Henrique, B. M., Sobreiro, V. A., & Kimura, H. (2019). Literature review : Machine learning techniques applied to financial market prediction R. Expert Systems With Applications, 124, 226-251. https://doi.org/10.1016/j.eswa.2019.01.012

Kumar, R. & Verma, R. (2012). Classification Algorithms for Data Mining: A Survey. International Journal of Innovations in Engineering and Technology, 1(2), 7-14.

Larose, D. T. & Larose, C. D. (2014). Discovering Knowledge in Data (2nd ed.). John Wiley & Sons, Inc.

Liu, J., Kong, X., Zhou, X., Wang, L., Zhang, D., Lee, I., Xu, B., & Xia, F. (2019). Data Mining and Information Retrieval in the 21st century: A bibliographic review. Computer Science Review, 34. https://doi.org/10.1016/j.cosrev.2019.100193

Liu, Y., Esan, O. C., Pan, Z., & An, L. (2021). Machine learning for advanced energy materials. Energy and AI, 3. https://doi.org/10.1016/j.egyai.2021.100049

Marcelin, I., Egbendewe, A. Y. G., Oloufade, D. K., & Sun, W. (2021). Financial inclusion, bank ownership, and economy performance: Evidence from developing countries. Finance Research Letters, 102322. https://doi.org/10.1016/j.frl.2021.102322

Masmoudi, Y., Turkay, M., & Chabchoub, H. (2013). A binarization strategy for modelling mixed data in multigroup classification. International Conference on Advanced Logistics and Transport, 347-353. https://doi.org/10.1109/ICAdLT.2013.6568483

Morgan, P. J., & Pontines, V. (2018). Financial stability and financial inclusion: The case of SME lending. The Singapore Economic Review, 63(01), 111-124. https://doi.org/10.1142/S0217590818410035

Oliveira, A., Faria, B. M., Gaio, A. R., & Reis, L. P. (2017). Data Mining in HIV-AIDS Surveillance System: Application to Portuguese Data. Journal of Medical Systems, 41(4). https://doi.org/10.1007/s10916-017-0697-4

Rabelo, E., Campos, F. C. de, & Silva, L. M. C. da. (2021). Aplicação de um modelo de descoberta de conhecimento na Era do Big Data. Brazilian Journal of Production Engineering, 7(3), 106-125. https://doi.org/10.47456/bjpe.v7i3.35743

Robino, C., Trivelli, C., Villanueva, C., Sachetti, F. C., Walbey, H., Martinez, L., & Marincioni, M. (2018). Financial Inclusion for Women: A Way Forward.

Rodriguez-Galiano, V. F., Luque-Espinar, J. A., Chica-Olmo, M., & Mendes, M. P. (2018). Feature selection approaches for predictive modelling of groundwater nitrate pollution: An evaluation of filters, embedded and wrapper methods. Science of the Total Environment, 624, 661-672. https://doi.org/10.1016/j.scitotenv.2017.12.152

Tan, P.-N., Steinbach, M., Karpatne, A., & Kumar, V. (2019). Introduction to Data Mining (2nd ed.). Pearson Prentice Hall.

Published

2022-02-14

How to Cite

Souza, P. R. V. de, Tomazi, B. G., & Santos, B. S. dos. (2022). Use of classification techniques in a dataset on financial inclusion: a study based on Latin American countries. Brazilian Journal of Production Engineering, 8(1), 73–91. https://doi.org/10.47456/bjpe.v8i1.37019