CRIS Current Research Information System

Over the last 25 years, a considerable proliferation of software metrics and a plethora of tools have emerged to extract them. While this is indeed positive concerning the previous situations of limited data, it still leads to a significant problem arising both from a theoretical and a practical standpoint. From a theoretical perspective, several metrics are likely to result in collinearity, overfitting, etc. From a practical perspective, such a set of metrics is difficult to manage and companies, especially small ones, may feel overwhelmed and unable to select a viable subset of them. Still, so far it has not been fully understood what is a viable subset of metrics suitable to properly manage software projects and products. In this paper, we attempt to address this issue. We focus on the case of programs written in Java and we consider classes and methods. We use Sammon error as a measure of the similarity of metrics. Utilizing both Particle Swarm Optimization and Genetic Algorithm, we adapted a method for the identification of a viable subset of such metrics that could solve the mentioned problem. Furthermore, we experiment with our approach on 800 projects coming from GitHub and validate the results on 200 projects. With the proposed method we got optimal subsets of software engineering metrics. These subsets gave us low values of Sammon error at more than 70\% at class and method levels on a validation dataset.

Bugayenko, Y., Kholmatova, Z., Kruglov, A., Pedrycz, W., Succi, G. (2024). Selecting optimal software code descriptors—The case of Java. PLOS ONE, 19(11), 1-23 [10.1371/journal.pone.0310840].

Selecting optimal software code descriptors—The case of Java

Yegor Bugayenko;Zamira Kholmatova;Artem Kruglov;Witold Pedrycz;Giancarlo Succi

2024

Abstract

Over the last 25 years, a considerable proliferation of software metrics and a plethora of tools have emerged to extract them. While this is indeed positive concerning the previous situations of limited data, it still leads to a significant problem arising both from a theoretical and a practical standpoint. From a theoretical perspective, several metrics are likely to result in collinearity, overfitting, etc. From a practical perspective, such a set of metrics is difficult to manage and companies, especially small ones, may feel overwhelmed and unable to select a viable subset of them. Still, so far it has not been fully understood what is a viable subset of metrics suitable to properly manage software projects and products. In this paper, we attempt to address this issue. We focus on the case of programs written in Java and we consider classes and methods. We use Sammon error as a measure of the similarity of metrics. Utilizing both Particle Swarm Optimization and Genetic Algorithm, we adapted a method for the identification of a viable subset of such metrics that could solve the mentioned problem. Furthermore, we experiment with our approach on 800 projects coming from GitHub and validate the results on 200 projects. With the proposed method we got optimal subsets of software engineering metrics. These subsets gave us low values of Sammon error at more than 70\% at class and method levels on a validation dataset.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2024
			
	Rivista
	
				PLOS ONE
			
	Codice DOI
	
				https://dx.doi.org/10.1371/journal.pone.0310840
			
	Citazione
	
				Bugayenko, Y., Kholmatova, Z., Kruglov, A., Pedrycz, W., Succi, G. (2024). Selecting optimal software code descriptors—The case of Java. PLOS ONE, 19(11), 1-23 [10.1371/journal.pone.0310840].
			
	Tutti gli autori
	
						Bugayenko, Yegor; Kholmatova, Zamira; Kruglov, Artem; Pedrycz, Witold; Succi, Giancarlo
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

File	Dimensione	Formato
journal.pone.0310840 (1).pdf accesso aperto Tipo: Versione (PDF) editoriale / Version Of Record Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY) Dimensione 1.76 MB Formato Adobe PDF Visualizza/Apri	1.76 MB	Adobe PDF	Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/999658

Citazioni

ND

7

4

5

social impact