A k–Skyband Approach for Feature Selection

Bedo, M.; Ciaccia, P.; Martinenghi, D.; de Oliveira, D.

doi:10.1007/978-3-030-32047-8_15

Distance concentration is a phantom menace for the labeling of high dimensional data by distance-based classifiers. Filter methods reduce data dimensionality, but they also add their ranking bias indirectly into the classification procedure. In this study, we examine the filtering problem from another perspective, in which multiple filters are aggregated according to classifiers’ constraints. Our approach, named S-Filter, is designed as a top-k skyline (k-skyband) search over multiple rankings by relying on the concept of (Formula Presented) –dominance for weighted and monotone linear functions. Unlike existing approaches, S-Filter provides a deterministic strategy for joining multiple filters and avoids the semantic problem of breaking top-k ties. S-Filter’s first stage uses labeling-driven measures, e.g., F1-Score, for assessing the quality of each filter with regards to a particular classifier, whereas range-tolerance intervals around the initial quality measures define the partial search weights. Next, S-Filter applies the FSA instance-optimal algorithm for selecting all the dimensions that can be among the top-k for a weight within the range-tolerance intervals. Experiments on high dimensional datasets show that S-Filter outperforms state-of-the-art filters in two scenarios: (i) exploratory analysis on varying k and range-tolerance intervals, and (ii) data reduction to its intrinsic dimensionality.

Bedo M., Ciaccia P., Martinenghi D., de Oliveira D. (2019). A k–Skyband Approach for Feature Selection. Heidelberg : Springer [10.1007/978-3-030-32047-8_15].

A k–Skyband Approach for Feature Selection

Bedo M.;Ciaccia P.;Martinenghi D.;de Oliveira D.

2019

Abstract

Distance concentration is a phantom menace for the labeling of high dimensional data by distance-based classifiers. Filter methods reduce data dimensionality, but they also add their ranking bias indirectly into the classification procedure. In this study, we examine the filtering problem from another perspective, in which multiple filters are aggregated according to classifiers’ constraints. Our approach, named S-Filter, is designed as a top-k skyline (k-skyband) search over multiple rankings by relying on the concept of (Formula Presented) –dominance for weighted and monotone linear functions. Unlike existing approaches, S-Filter provides a deterministic strategy for joining multiple filters and avoids the semantic problem of breaking top-k ties. S-Filter’s first stage uses labeling-driven measures, e.g., F1-Score, for assessing the quality of each filter with regards to a particular classifier, whereas range-tolerance intervals around the initial quality measures define the partial search weights. Next, S-Filter applies the FSA instance-optimal algorithm for selecting all the dimensions that can be among the top-k for a weight within the range-tolerance intervals. Experiments on high dimensional datasets show that S-Filter outperforms state-of-the-art filters in two scenarios: (i) exploratory analysis on varying k and range-tolerance intervals, and (ii) data reduction to its intrinsic dimensionality.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Titolo del volume
	
				Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
			
	Pagina iniziale
	
				160
			
	Pagina finale
	
				168
			
	Collana/Serie
	
				LECTURE NOTES IN COMPUTER SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-030-32047-8_15
			
	Citazione
	
				Bedo M.,  Ciaccia P.,  Martinenghi D.,  de Oliveira D. (2019). A k–Skyband Approach for Feature Selection. Heidelberg : Springer [10.1007/978-3-030-32047-8_15].
			
	Tutti gli autori
	
						Bedo M.; Ciaccia P.; Martinenghi D.; de Oliveira D.
					
	Appare nelle tipologie:
	
				4.01 Contributo in Atti di convegno

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/738492

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

2

1

CRIS Current Research Information System