The aim of this research project is to propose a new method for supervised classification problems where the input features are ordinal. Ordinal data are preponderant in many research fields. They directly arise when the observations fall into separate distinct but ordered categories and they are very common in surveys where answers are listed as Likert scales. Typically, they are coded as equally spaced values and sometimes they are analyzed as numerical values. These choices may not necessarily correspond to the real distribution of the data. The objectives of the study have been accomplished according to several steps. The first phase consisted of an exhaustive analysis of the state of art of the statistical literature with the aim of identifying the various approaches to ordinal data analysis, the related limitations, and possible advantages. We have then proposed to operate in the framework of Generalized Linear Latent Variable Models (GLLVM), considering the response function approach with a single latent variable Beta distributed. Our scope in using this method is to shift from a set of ordinal features to a single continuous feature, which well adapt the data, in order to directly apply the standard classification methods. A dedicated EM algorithm has been developed on the basis of this theoretical framework using the statistical software R. Finally, we have compared our approach with several scoring methods through a wide simulation study. The scoring methods that we have considered in the simulation study are: the raw scores, the ridits, the blom scores, the normal median scores and the conditional mean scores. These methods, although have a long history in literature, have never been used for classification purpose. In addition we present an example of the application of the proposed approach to real world business data problem.

Ordinal data supervised classification with Quantile-based and other classifiers

MANCINI, LORENZO
In corso di stampa

Abstract

The aim of this research project is to propose a new method for supervised classification problems where the input features are ordinal. Ordinal data are preponderant in many research fields. They directly arise when the observations fall into separate distinct but ordered categories and they are very common in surveys where answers are listed as Likert scales. Typically, they are coded as equally spaced values and sometimes they are analyzed as numerical values. These choices may not necessarily correspond to the real distribution of the data. The objectives of the study have been accomplished according to several steps. The first phase consisted of an exhaustive analysis of the state of art of the statistical literature with the aim of identifying the various approaches to ordinal data analysis, the related limitations, and possible advantages. We have then proposed to operate in the framework of Generalized Linear Latent Variable Models (GLLVM), considering the response function approach with a single latent variable Beta distributed. Our scope in using this method is to shift from a set of ordinal features to a single continuous feature, which well adapt the data, in order to directly apply the standard classification methods. A dedicated EM algorithm has been developed on the basis of this theoretical framework using the statistical software R. Finally, we have compared our approach with several scoring methods through a wide simulation study. The scoring methods that we have considered in the simulation study are: the raw scores, the ridits, the blom scores, the normal median scores and the conditional mean scores. These methods, although have a long history in literature, have never been used for classification purpose. In addition we present an example of the application of the proposed approach to real world business data problem.
In corso di stampa
209
Mancini, Lorenzo
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/631026
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact