Multivariate statistical methods are among the most used ones in sports sciences with clustering methods emerging as prominent unsupervised learning techniques. This study presents a scoping review of original articles utilizing clustering techniques in sports sciences, following the PRISMASCR guidelines. A comprehensive search across various databases using the boolean "AND" combination of "clustering" and "sport" yielded 278 articles. Notably, 86.7% of these articles were published within the last 14 years, with a predominant focus (66.2%) on sports performance analysis. The majority of studies included professional athletes (56.4%), with football/soccer, basketball, and tennis being the most commonly studied sports, representing 12.2%, 7.5%, and 2.2% of the selected articles, respectively. Hierarchical clustering was the most frequently used method (31.6%), followed by the k-means algorithm for partitional clustering. However, the clustering method was not reported in 26.6% of the articles, and 55.0% did not specify the criterion used for determining the optimal number of clusters. Moreover, more than 85% of the articles lacked computational details related to data reproducibility. These findings underscore the urgent need for substantial improvement in reporting practices regarding the methodology, algorithms, criteria for cluster identification, and software usage in sports science literature.
Reporting of clustering techniques in sports sciences: a scoping review
Manisera M.
2024-01-01
Abstract
Multivariate statistical methods are among the most used ones in sports sciences with clustering methods emerging as prominent unsupervised learning techniques. This study presents a scoping review of original articles utilizing clustering techniques in sports sciences, following the PRISMASCR guidelines. A comprehensive search across various databases using the boolean "AND" combination of "clustering" and "sport" yielded 278 articles. Notably, 86.7% of these articles were published within the last 14 years, with a predominant focus (66.2%) on sports performance analysis. The majority of studies included professional athletes (56.4%), with football/soccer, basketball, and tennis being the most commonly studied sports, representing 12.2%, 7.5%, and 2.2% of the selected articles, respectively. Hierarchical clustering was the most frequently used method (31.6%), followed by the k-means algorithm for partitional clustering. However, the clustering method was not reported in 26.6% of the articles, and 55.0% did not specify the criterion used for determining the optimal number of clusters. Moreover, more than 85% of the articles lacked computational details related to data reproducibility. These findings underscore the urgent need for substantial improvement in reporting practices regarding the methodology, algorithms, criteria for cluster identification, and software usage in sports science literature.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.


