Clustering of football players based on performance data and aggregated clustering validity indexes

Akhanli, Se; Hennig, C

doi:10.1515/jqas-2022-0037

We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020. "Comparing Clusterings and Numbers of Clusters by Aggregation of Calibrated Clustering Validity Indexes." Statistics and Computing 30 (5): 1523-44). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team's composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.

Akhanli, S.e., Hennig, C. (2023). Clustering of football players based on performance data and aggregated clustering validity indexes. JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 19(2), 103-123 [10.1515/jqas-2022-0037].

Clustering of football players based on performance data and aggregated clustering validity indexes

Hennig, C^{Membro del Collaboration Group}

2023

Abstract

We analyse football (soccer) player performance data with mixed type variables from the 2014-15 season of eight European major leagues. We cluster these data based on a tailor-made dissimilarity measure. In order to decide between the many available clustering methods and to choose an appropriate number of clusters, we use the approach by Akhanli and Hennig (2020. "Comparing Clusterings and Numbers of Clusters by Aggregation of Calibrated Clustering Validity Indexes." Statistics and Computing 30 (5): 1523-44). This is based on several validation criteria that refer to different desirable characteristics of a clustering. These characteristics are chosen based on the aim of clustering, and this allows to define a suitable validation index as weighted average of calibrated individual indexes measuring the desirable features. We derive two different clusterings. The first one is a partition of the data set into major groups of essentially different players, which can be used for the analysis of a team's composition. The second one divides the data set into many small clusters (with 10 players on average), which can be used for finding players with a very similar profile to a given player. It is discussed in depth what characteristics are desirable for these clusterings. Weighting the criteria for the second clustering is informed by a survey of football experts.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2023
			
	Rivista
	
				JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS
			
	Codice DOI
	
				https://dx.doi.org/10.1515/jqas-2022-0037
			
	Citazione
	
				Akhanli, S.e., Hennig, C. (2023). Clustering of football players based on performance data and aggregated clustering validity indexes. JOURNAL OF QUANTITATIVE ANALYSIS IN SPORTS, 19(2), 103-123 [10.1515/jqas-2022-0037].
			
	Tutti gli autori
	
						Akhanli, Se; Hennig, C

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/949306

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

8

6

ND

CRIS Current Research Information System