The presence of imbalanced classes is more and more common in practical applications and it is known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing this issue in binary supervised classification. Re-balancing the class sizes has turned out to be a fruitful strategy to overcome this problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching is a recently developed data compression technique that is characterized by the property of preserving most of the linear information that is present in the data. Such property is guaranteed by the Johnson-Lindenstrauss’ Lemma (1984) and allows to embed an n-dimensional space into a reduced one without distorting, within an ϵ-size interval, the distances between any pair of points. We propose to use matrix sketching as an alternative to the standard re-balancing strategies that are based on random under-sampling the majority class or random over-sampling the minority one. We assess the properties of our method when combined with linear discriminant analysis (LDA), classification trees (C4.5) and Support Vector Machines (SVM) on simulated and real data. Results show that sketching can represent a sound alternative to the most widely used rebalancing methods.

Falcone R., Anderlucci L., Montanari A. (2022). Matrix sketching for supervised classification with imbalanced classes. DATA MINING AND KNOWLEDGE DISCOVERY, 36, 174-208 [10.1007/s10618-021-00791-3].

Matrix sketching for supervised classification with imbalanced classes

Falcone R.
;
Anderlucci L.;Montanari A.
2022

Abstract

The presence of imbalanced classes is more and more common in practical applications and it is known to heavily compromise the learning process. In this paper we propose a new method aimed at addressing this issue in binary supervised classification. Re-balancing the class sizes has turned out to be a fruitful strategy to overcome this problem. Our proposal performs re-balancing through matrix sketching. Matrix sketching is a recently developed data compression technique that is characterized by the property of preserving most of the linear information that is present in the data. Such property is guaranteed by the Johnson-Lindenstrauss’ Lemma (1984) and allows to embed an n-dimensional space into a reduced one without distorting, within an ϵ-size interval, the distances between any pair of points. We propose to use matrix sketching as an alternative to the standard re-balancing strategies that are based on random under-sampling the majority class or random over-sampling the minority one. We assess the properties of our method when combined with linear discriminant analysis (LDA), classification trees (C4.5) and Support Vector Machines (SVM) on simulated and real data. Results show that sketching can represent a sound alternative to the most widely used rebalancing methods.
2022
Falcone R., Anderlucci L., Montanari A. (2022). Matrix sketching for supervised classification with imbalanced classes. DATA MINING AND KNOWLEDGE DISCOVERY, 36, 174-208 [10.1007/s10618-021-00791-3].
Falcone R.; Anderlucci L.; Montanari A.
File in questo prodotto:
File Dimensione Formato  
Falcone2021_Article_MatrixSketchingForSupervisedCl.pdf

accesso aperto

Tipo: Versione (PDF) editoriale
Licenza: Licenza per Accesso Aperto. Creative Commons Attribuzione (CCBY)
Dimensione 1.79 MB
Formato Adobe PDF
1.79 MB Adobe PDF Visualizza/Apri

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/845208
Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus 2
  • ???jsp.display-item.citation.isi??? 1
social impact