The mixed model for the analysis of a repeated-measurement multivariate count data

Martin, I.; H. -W., Uh; Supali, T.; Mitreva, M.; Houwing-Duistermaat, J. J.

doi:10.1002/sim.8101

Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet-multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet-multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject-specific and categorical-specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth-endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected.

Martin I., Uh H.-W., Supali T., Mitreva M., Houwing-Duistermaat J.J. (2019). The mixed model for the analysis of a repeated-measurement multivariate count data. STATISTICS IN MEDICINE, 38(12), 2248-2268 [10.1002/sim.8101].

The mixed model for the analysis of a repeated-measurement multivariate count data

Martin I.;Uh H. -W.;Supali T.;Mitreva M.;Houwing-Duistermaat J. J.

2019

Abstract

Clustered overdispersed multivariate count data are challenging to model due to the presence of correlation within and between samples. Typically, the first source of correlation needs to be addressed but its quantification is of less interest. Here, we focus on the correlation between time points. In addition, the effects of covariates on the multivariate counts distribution need to be assessed. To fulfill these requirements, a regression model based on the Dirichlet-multinomial distribution for association between covariates and the categorical counts is extended by using random effects to deal with the additional clustering. This model is the Dirichlet-multinomial mixed regression model. Alternatively, a negative binomial regression mixed model can be deployed where the corresponding likelihood is conditioned on the total count. It appears that these two approaches are equivalent when the total count is fixed and independent of the random effects. We consider both subject-specific and categorical-specific random effects. However, the latter has a larger computational burden when the number of categories increases. Our work is motivated by microbiome data sets obtained by sequencing of the amplicon of the bacterial 16S rRNA gene. These data have a compositional structure and are typically overdispersed. The microbiome data set is from an epidemiological study carried out in a helminth-endemic area in Indonesia. The conclusions are as follows: time has no statistically significant effect on microbiome composition, the correlation between subjects is statistically significant, and treatment has a significant effect on the microbiome composition only in infected subjects who remained infected.

Scheda breve

Scheda completa

Scheda completa (DC)

	Anno
	
				2019
			
	Rivista
	
				STATISTICS IN MEDICINE
			
	Codice DOI
	
				https://dx.doi.org/10.1002/sim.8101
			
	Citazione
	
				Martin I.,  Uh H.-W.,  Supali T.,  Mitreva M.,  Houwing-Duistermaat J.J. (2019). The mixed model for the analysis of a repeated-measurement multivariate count data. STATISTICS IN MEDICINE, 38(12), 2248-2268 [10.1002/sim.8101].
			
	Tutti gli autori
	
						Martin I.; Uh H.-W.; Supali T.; Mitreva M.; Houwing-Duistermaat J.J.
					
	Appare nelle tipologie:
	
				1.01 Articolo in rivista

File in questo prodotto:

Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/871987

Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni

ND

15

14

ND

CRIS Current Research Information System