Wilks' dissimilarity for gene clustering: computational issues

Roverato, Alberto; Marta Di Lascio,

doi:10.2427/8761

Clustering methods are widely used in the analysis of gene expression data for their ability to uncover coordinated expression profiles. One important goal of clustering is to discover co–regulated genes because it has been postulated that co–regulation implies a similar function. In the context of agglomerative hierarchical clustering, we introduced a dissimilarity measure based on the Wilks’ Λ statistic that they called the Wilks’ dissimilarity and showed its usefulness in the identification of transcription modules. In this paper, we discuss the ability of the Wilks’ dissimilarity to identify clusters of co-expressed genes by providing an example where the most commonly used dissimilarity measures fail. Furthermore, we carry out a set of simulations aimed to investigate the use of a sparse canonical correlation technique in the estimation of the Wilks’ dissimilarity and provide guidelines for its use.

Alberto Roverato, Marta di Lascio (2013). Wilks' dissimilarity for gene clustering: computational issues. EPIDEMIOLOGY BIOSTATISTICS AND PUBLIC HEALTH, 10(2), 1-10 [10.2427/8761].