Clustering methods are widely used in the analysis of microarray data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover coregulated genes because it has been postulated that genes targeted by the same transcription factors tend to show similar expression patterns. We focus on agglomerative hierarchical clustering and consider the problem of choosing a dissimilarity measure on the basis of its ability to identify functional modules consisting of a transcription factor and the associated target genes. We first propose two criteria that constitute a theoretical framework for assessing the adequacy and comparing different dissimilarity measures. We show that the proposed criteria allow one to gain insight into the behavior of dissimilarity measures and lead to a ranking of some of the most commonly used dissimilarity measures. Next, we introduce two dissimilarity measures based on the Wilks’ Λ statistic and show that, according to the above criteria, they have better performance than the other considered measures. The theoretical results are supported by an applied analysis on both simulated and real data.
A. Roverato, F.M.L. Di Lascio (2011). Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules. BIOMETRICS, 67(4), 1236-1248 [10.1111/j.1541-0420.2011.01571.x].
Wilks' Λ Dissimilarity Measures for Gene Clustering: An Approach Based on the Identification of Transcription Modules
ROVERATO, ALBERTO;DI LASCIO, FRANCESCA MARTA LILJA
2011
Abstract
Clustering methods are widely used in the analysis of microarray data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover coregulated genes because it has been postulated that genes targeted by the same transcription factors tend to show similar expression patterns. We focus on agglomerative hierarchical clustering and consider the problem of choosing a dissimilarity measure on the basis of its ability to identify functional modules consisting of a transcription factor and the associated target genes. We first propose two criteria that constitute a theoretical framework for assessing the adequacy and comparing different dissimilarity measures. We show that the proposed criteria allow one to gain insight into the behavior of dissimilarity measures and lead to a ranking of some of the most commonly used dissimilarity measures. Next, we introduce two dissimilarity measures based on the Wilks’ Λ statistic and show that, according to the above criteria, they have better performance than the other considered measures. The theoretical results are supported by an applied analysis on both simulated and real data.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.