Classification in Horse Race Prediction Through Principal Component Decomposition

Main Article Content

Jason West
Vlad Kazakov


The established view for horse race handicapping and staking strategies is to model them as a classification problem using factors describing horse, jockey, trainer, and racing history coupled with public odds, solved via a logistic regression. Logistic regression probabilities are then normalised, and bets filtered by threshold, or anomalous pricing. However, published algorithms do not show systematic profitability, nor do machine learning approaches using algorithmic betting strategies. This deficiency is due to three factors. First, wins are rare and racing data are thus imbalanced. Second, racing factors are multicollinear. Third, the number of factors needed for accurate prediction is very large. We show that alternative methods using variants from principal component analysis produces sustainable profitability regardless of staking strategy through a reduction of factors to fundamental drivers. We apply a partial least squares regression methodology to Australian thoroughbred racing. This approach is shown to outperform logistic regression and machine learning methods in classifying winners for a profitable trading strategy. This method can be applied to multiple betting domains.

Article Details