A Comparative Study of Machine Learning Models For NCAA Men’s Basketball Tournament Games Outcome Prediction

Main Article Content

Yuanzhen Ni
Seongno Lee


The evolution of machine learning has produced many avant-garde prediction algorithms, some of which provide more accurate predictions than conventional statistical tools. For the last two decades, machine learning (ML) approaches have largely been employed to predict game results. The purpose of this study is to utilize data gathered from web sources on the performance of 354 NCAA basketball teams over five seasons from 2015 to 2019 to forecast the results of NCAA men’s basketball tournament games with an assortment of big data and machine learning classification models. Among these include Sweet Sixteen, Elite Eight, and Final Four. The prediction results of each model were analysed and compared, and Decision Tree showed the best prediction performance compared with KNN, Logistic Regression, and SVM classification models, with a prediction accuracy of 75.71%, but Decision Tree was prone to overfitting problems. However, the Decision Tree is prone to overfitting problems, while Random Forest can correct the overfitting problem of Decision Tree by bagging and reduce the variance of Decision Tree prediction. Therefore, this study hypothesized that Random Forest would outperform Decision Tree in predicting NCAA game results. The results showed that, after a comprehensiveanalysis and comparison of the evaluation metrics of the Decision Tree and Random Forest models, Random Forest was found to have better acceptable forecast performance than Decision Tree, with a prediction accuracy of 85.71%.

Article Details