This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.
Accuracy and explainability of statistical and machine learning xG models in football
Cefis, Mattia
;Carpita, Maurizio
2024-01-01
Abstract
This study aims to propose an original approach to the interpretability of the explanatory variables (features) in the well-known expected goals (xG) model for shot analysis in football. To do this, a new original sample of 7801 shots from Italy’s Serie A (1 binary outcome and 26 features) for the 2022/2023 and 2023/2024 seasons were used, in which 8 new features of various types were introduced, integrating event data, performance data, and tracking data. Specifically, the performance of 8 statistical and machine learning (algorithmic) classifiers was compared. The focus was on two key aspects related to the field of explainable Artificial Intelligence (xAI), ‘accuracy’ and ‘explainability’, assessed using some appropriate metrics. Considering the accuracy metrics, among the statistical classifiers Binary Regression (BR) with the cloglog link function is the most effective. In contrast, among the algorithmic classifiers, xGBoost has the best performance but is slightly lower than the BR-cloglog. Regarding explainability, the primary contribution to the xG consistently comes from a small set of variables across all classifiers. The most influential features are the proximity to the goal, the shooting angle, and the shooter’s visual angle.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.