Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

Ranciati, Saverio; Viroli, Cinzia; Wit, Ernst C.

doi:10.1002/bimj.201600131

Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next-generation sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP-Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a ‘zero-inflation group’ in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent conditional autoregressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP-seq real data.

Ranciati, S., Viroli, C., Wit, E.C. (2017). Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data. BIOMETRICAL JOURNAL, 59(6), 1301-1316 [10.1002/bimj.201600131].

Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data

RANCIATI, SAVERIO;VIROLI, CINZIA;Ernst C. Wit

2017

Abstract

Model-based clustering is a technique widely used to group a collection of units into mutually exclusive groups. There are, however, situations in which an observation could in principle belong to more than one cluster. In the context of next-generation sequencing (NGS) experiments, for example, the signal observed in the data might be produced by two (or more) different biological processes operating together and a gene could participate in both (or all) of them. We propose a novel approach to cluster NGS discrete data, coming from a ChIP-Seq experiment, with a mixture model, allowing each unit to belong potentially to more than one group: these multiple allocation clusters can be flexibly defined via a function combining the features of the original groups without introducing new parameters. The formulation naturally gives rise to a ‘zero-inflation group’ in which values close to zero can be allocated, acting as a correction for the abundance of zeros that manifest in this type of data. We take into account the spatial dependency between observations, which is described through a latent conditional autoregressive process that can reflect different dependency patterns. We assess the performance of our model within a simulation environment and then we apply it to ChIP-seq real data.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2017
			
	Rivista
	
				BIOMETRICAL JOURNAL
			
	Codice DOI
	
				https://dx.doi.org/10.1002/bimj.201600131
			
	Citazione
	
				Ranciati, S., Viroli, C., Wit, E.C. (2017). Mixture model with multiple allocations for clustering spatially correlated observations in the analysis of ChIP-Seq data. BIOMETRICAL JOURNAL, 59(6), 1301-1316 [10.1002/bimj.201600131].
			
	Tutti gli autori
	
						Ranciati, Saverio; Viroli, Cinzia; Wit, Ernst C.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/610660

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

5

4

CRIS Current Research Information System