In this paper the relationship between the outcome of a football match (win, lose or draw) and a set of variables describing the game actions is investigated across time, by analyzing data from 4 consecutive yearly championships. The aim of the study is to discover the factors leading to win the match. More precisely, the goal is to select, from hundreds of covariates, those that most strongly affect the probability of winning a match, to recognize regularities across time by identifying the variables whose importance in confirmed in different analyses, and finally to construct a small number of composite indicators to be interpreted as drivers of match outcome. These tasks are carried out using the Random Forest machine learning algorithm, in order to select the most important variables, and Principal Component Analysis, in order to summarize them into a small number of drivers. Variable selection is performed using the novel approach developed by Sandri and Zuccolotto.
Discovering the Drivers of Football Match Outcomes with Data Mining
CARPITA, Maurizio;SANDRI, Marco;SIMONETTO, Anna;ZUCCOLOTTO, Paola
2015-01-01
Abstract
In this paper the relationship between the outcome of a football match (win, lose or draw) and a set of variables describing the game actions is investigated across time, by analyzing data from 4 consecutive yearly championships. The aim of the study is to discover the factors leading to win the match. More precisely, the goal is to select, from hundreds of covariates, those that most strongly affect the probability of winning a match, to recognize regularities across time by identifying the variables whose importance in confirmed in different analyses, and finally to construct a small number of composite indicators to be interpreted as drivers of match outcome. These tasks are carried out using the Random Forest machine learning algorithm, in order to select the most important variables, and Principal Component Analysis, in order to summarize them into a small number of drivers. Variable selection is performed using the novel approach developed by Sandri and Zuccolotto.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.