CRIS Current Research Information System

In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

Domeniconi, G., Moro, G., Pasolini, R., Sartori, C. (2015). Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification. Heidelberg : Springer [10.1007/978-3-319-25840-9_4].

Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification

DOMENICONI, GIACOMO;MORO, GIANLUCA;PASOLINI, ROBERTO;SARTORI, CLAUDIO

2015

Abstract

In cross-domain text classification, topic labels for documents of a target domain are predicted by leveraging knowledge of labeled documents of a source domain, having equal or similar topics with possibly different words. Existing methods either adapt documents of the source domain to the target or represent both domains in a common space. These methods are mostly based on advanced statistical techniques and often require tuning of parameters in order to obtain optimal performances. We propose a more straightforward approach based on nearest centroid classification: profiles of topic categories are extracted from the source domain and are then adapted by iterative refining steps using most similar documents in the target domain. Experiments on common benchmark datasets show that this approach, despite its simplicity, obtains accuracy measures better or comparable to other methods, obtained with fixed empirical values for its few parameters.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2015
			
	Titolo del volume
	
				Knowledge Discovery, Knowledge Engineering and Knowledge Management- 6th International Joint Conference, IC3K 2014, Rome, Italy, October21-24, 2014, Revised Selected Papers
			
	Pagina iniziale
	
				50
			
	Pagina finale
	
				67
			
	Collana/Serie
	
				COMMUNICATIONS IN COMPUTER AND INFORMATION SCIENCE
			
	Codice DOI
	
				https://dx.doi.org/10.1007/978-3-319-25840-9_4
			
	Citazione
	
				Domeniconi, G., Moro, G., Pasolini, R., Sartori, C. (2015). Iterative Refining of Category Profiles for Nearest Centroid Cross-Domain Text Classification. Heidelberg : Springer [10.1007/978-3-319-25840-9_4].
			
	Tutti gli autori
	
						Domeniconi, Giacomo; Moro, Gianluca; Pasolini, Roberto; Sartori, Claudio
					
	Appare nelle tipologie:
	
				2.01 Capitolo / saggio in libro

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/555363

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

25

19

ND

social impact