This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.

Accuracy and explainability of statistical and machine learning xG models in football

Cefis, Mattia
;
Carpita, Maurizio
2024-01-01

Abstract

This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.
File in questo prodotto:
Non ci sono file associati a questo prodotto.

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11379/617886
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact