An aspect of cluster analysis which has been widely studied in recent years is the weighting and selection of variables. Procedures have been proposed which are able to identify the cluster structure present in a data matrix when that structure is confined to a subset of variables. Other methods assess the relative importance of each variable as revealed by a suitably chosen weight. But when a cluster structure is present in more than one subset of variables and is different from one subset to another, those solutions as well as standard clustering algorithms can lead to misleading results. Some very recent methodologies for finding consensus classifications of the same set of units can be useful also for the identification of cluster structures in a data matrix, but each one seems to be only partly satisfactory for the purpose at hand. Therefore a new more specific procedure is proposed and illustrated by analyzing two real data sets; its performances are evaluated by means of a simulation experiment.
Soffritti G. (2003). Identifying Multiple Cluster Structures in a Data Matrix. COMMUNICATIONS IN STATISTICS. SIMULATION AND COMPUTATION, 32(4), 1151-1177 [10.1081/SAC-120023883].
Identifying Multiple Cluster Structures in a Data Matrix
Soffritti G.
Primo
2003
Abstract
An aspect of cluster analysis which has been widely studied in recent years is the weighting and selection of variables. Procedures have been proposed which are able to identify the cluster structure present in a data matrix when that structure is confined to a subset of variables. Other methods assess the relative importance of each variable as revealed by a suitably chosen weight. But when a cluster structure is present in more than one subset of variables and is different from one subset to another, those solutions as well as standard clustering algorithms can lead to misleading results. Some very recent methodologies for finding consensus classifications of the same set of units can be useful also for the identification of cluster structures in a data matrix, but each one seems to be only partly satisfactory for the purpose at hand. Therefore a new more specific procedure is proposed and illustrated by analyzing two real data sets; its performances are evaluated by means of a simulation experiment.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.