Analisa Komparasi Model Data Mining Algoritma C4.5, CHAID, dan Random Forest Untuk Penilaian Kelayakan Kredit

Amrin Amrin; Omar Pahlevi; Harsih  Rianto

doi:10.31294/coscience.v5i1.6208

Authors

Amrin Amrin Universitas Bina Sarana Informatika
Omar Pahlevi Universitas Bina Sarana Informatika
Harsih Rianto Universitas Bina Sarana Informatika

DOI:

https://doi.org/10.31294/coscience.v5i1.6208

Keywords:

classification, algorithm C4.5, Chi-Squared Automatic Interaction Detection (CHAID), random forest, confusion matrix

Abstract

Credit has now become a trend in society. The problem with credit is the improper history of credit card usage. The resulting impact can lead to bad credit. If customers fail to pay off debts that have been agreed upon with the bank, they can increase their credit risk. This study aims to conduct a comparative analysis of three data mining classification methods: the C4.5 algorithm, Chi-Squared Automatic Interaction Detection (CHAID), and Random Forest. The goal is to classify creditworthiness status. The researcher used 481 vehicle credit records with "bad" and "good" reviews. In this study, the independent variables used are dependent status, age, marital status, occupation, income, employment status, company status, last education, length of stay, house condition, and down payment. For creditworthiness assessment, the C4.5 model shows a truth accuracy rate of 91.90% with an area under the curve (AUC) value of 0.915. The CHAID model shows a truth accuracy rate of 63.83% with an AUC value of 0.661, and the Random Forest model shows a truth accuracy rate of 78.60% with an AUC value of 0.907. The evaluation results show that both the Random Forest and C4.5 algorithms have high accuracy rates and AUC values.