Software repositories contain a wealth of information about the aspects related to software development process. For this reason, many studies analyze software repositories using methods of data analytics with a focus on clustering. Software repository clustering has been applied in studying software ecosystems such as GitHub, defect and technical debt prediction, software remodularization. Although some interesting insights have been reported, the considered studies exhibited some limitations. The limitations are associated with the use of individual clustering methods and manifesting in the shortcomings of the obtained results. In this study, to alleviate the existing limitations we engage multiple cluster validity indices applied to multiple clustering methods and carry out consensus clustering. To our knowledge, this study is the first to apply the consensus clustering approach to analyze software repositories and one of the few to apply the consensus clustering to software metrics. Intensive experimental studies are reported for software repository metrics data consisting of a number of software repositories each described by software metrics. We revealed seven clusters of software repositories and relate them to developers’ activity. It is advocated that the proposed clustering environment could be useful for facilitating the decision making process for business investors and open-source community with the help of the Gartner’s hype cycle.

Bugayenko Y, Daniakin K, Farina M, Kholmatova Z, Kruglov A, Pedrycz W, et al. (2023). Qualitative Clustering of Software Repositories Based on Software Metrics. IEEE ACCESS, 11, 14716-14727 [10.1109/ACCESS.2023.3244495].

Qualitative Clustering of Software Repositories Based on Software Metrics

Succi G
2023

Abstract

Software repositories contain a wealth of information about the aspects related to software development process. For this reason, many studies analyze software repositories using methods of data analytics with a focus on clustering. Software repository clustering has been applied in studying software ecosystems such as GitHub, defect and technical debt prediction, software remodularization. Although some interesting insights have been reported, the considered studies exhibited some limitations. The limitations are associated with the use of individual clustering methods and manifesting in the shortcomings of the obtained results. In this study, to alleviate the existing limitations we engage multiple cluster validity indices applied to multiple clustering methods and carry out consensus clustering. To our knowledge, this study is the first to apply the consensus clustering approach to analyze software repositories and one of the few to apply the consensus clustering to software metrics. Intensive experimental studies are reported for software repository metrics data consisting of a number of software repositories each described by software metrics. We revealed seven clusters of software repositories and relate them to developers’ activity. It is advocated that the proposed clustering environment could be useful for facilitating the decision making process for business investors and open-source community with the help of the Gartner’s hype cycle.
2023
Bugayenko Y, Daniakin K, Farina M, Kholmatova Z, Kruglov A, Pedrycz W, et al. (2023). Qualitative Clustering of Software Repositories Based on Software Metrics. IEEE ACCESS, 11, 14716-14727 [10.1109/ACCESS.2023.3244495].
Bugayenko Y; Daniakin K; Farina M; Kholmatova Z; Kruglov A; Pedrycz W; Succi G
File in questo prodotto:
File Dimensione Formato  
Succi.J122.QualitativeClusteringOfSoftwareRepositoriesBasedOnSoftwareMetrics.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 993.3 kB
Formato Adobe PDF
993.3 kB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/919992
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 1
  • ???jsp.display-item.citation.isi??? 0
social impact