Classification in Horse Race Prediction Through Principal Component Decomposition

Jason West; Vlad Kazakov

doi:10.5750/jpm.v18i1.2093

PDF (GBP 25) EPUB (GBP 25)

Published: Jul 1, 2024

DOI: https://doi.org/10.5750/jpm.v18i1.2093

Keywords:

partial least squares logistic regression horse racing imbalanced data

Jason West

Bureau of Meteorology, Brisbane, QLD, 4000, Australia

Vlad Kazakov

Racelab Global, Sydney, NSW, 2000, Australia

Abstract

The established view for horse race handicapping and staking strategies is to model them as a classification problem using factors describing horse, jockey, trainer, and racing history coupled with public odds, solved via a logistic regression. Logistic regression probabilities are then normalised, and bets filtered by threshold, or anomalous pricing. However, published algorithms do not show systematic profitability, nor do machine learning approaches using algorithmic betting strategies. This deficiency is due to three factors. First, wins are rare and racing data are thus imbalanced. Second, racing factors are multicollinear. Third, the number of factors needed for accurate prediction is very large. We show that alternative methods using variants from principal component analysis produces sustainable profitability regardless of staking strategy through a reduction of factors to fundamental drivers. We apply a partial least squares regression methodology to Australian thoroughbred racing. This approach is shown to outperform logistic regression and machine learning methods in classifying winners for a profitable trading strategy. This method can be applied to multiple betting domains.

Issue

Vol. 18 No. 1 (2024)

Section

Articles

Article Sidebar

Main Article Content

Abstract

Article Details