Let $X=(X_1,\ldots, X_p)$ be the vector of covariates in a regression problem and let $\widetilde{X}$ be a knockoff copy of $X$ (in the sense of Candes et al. 2018). In a number of applications, mainly in genetics, there is a finite set $F$ such that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to make variable selection with the knockoff procedure, $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is partially exchangeable. In fact, when $X_i\in F$ for all $i$, assuming $X$ partially exchangeable is often a good strategy. In a few situations, even if extreme, it may be also reasonable to assume $X$ exchangeable. Hence, some attention is paid to the exchangeable special case. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $\pi$ of $X$, is investigated as well. Let $\mathcal{L}_\pi(\widetilde{X}\mid X=x)$ be the conditional distribution of $\widetilde{X}$, given $X=x$, when $X$ is exchangeable and the de Finetti's measure of $X$ is $\pi$. It is shown that $\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2}$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.

Dreassi, E., Pratelli, L., Rigo, P. (In stampa/Attività in corso). Knockoffs for partially exchangeable categorical covariates. STATISTICAL METHODS & APPLICATIONS, 1, 1-25.

Knockoffs for partially exchangeable categorical covariates

Rigo Pietro
In corso di stampa

Abstract

Let $X=(X_1,\ldots, X_p)$ be the vector of covariates in a regression problem and let $\widetilde{X}$ be a knockoff copy of $X$ (in the sense of Candes et al. 2018). In a number of applications, mainly in genetics, there is a finite set $F$ such that $X_i\in F$ for each $i=1,\ldots,p$. Despite the latter fact, to make variable selection with the knockoff procedure, $X$ is usually modeled as an absolutely continuous random vector. While comprehensible from the point of view of applications, this approximate procedure does not make sense theoretically, since $X$ is supported by the finite set $F^p$. In this paper, explicit formulae for the joint distribution of $(X,\widetilde{X})$ are provided when $P(X\in F^p)=1$ and $X$ is partially exchangeable. In fact, when $X_i\in F$ for all $i$, assuming $X$ partially exchangeable is often a good strategy. In a few situations, even if extreme, it may be also reasonable to assume $X$ exchangeable. Hence, some attention is paid to the exchangeable special case. The robustness of $\widetilde{X}$, with respect to the de Finetti's measure $\pi$ of $X$, is investigated as well. Let $\mathcal{L}_\pi(\widetilde{X}\mid X=x)$ be the conditional distribution of $\widetilde{X}$, given $X=x$, when $X$ is exchangeable and the de Finetti's measure of $X$ is $\pi$. It is shown that $\norm{\mathcal{L}_{\pi_1}(\widetilde{X}\mid X=x)-\mathcal{L}_{\pi_2}(\widetilde{X}\mid X=x)}\le c(x)\,\norm{\pi_1-\pi_2}$ where $\norm{\cdot}$ is total variation distance and $c(x)$ a suitable constant. Finally, a numerical experiment is performed. Overall, the knockoffs of this paper outperform the alternatives (i.e., the knockoffs obtained by giving $X$ an absolutely continuous distribution) as regards the false discovery rate but are slightly weaker in terms of power.
In corso di stampa
Dreassi, E., Pratelli, L., Rigo, P. (In stampa/Attività in corso). Knockoffs for partially exchangeable categorical covariates. STATISTICAL METHODS & APPLICATIONS, 1, 1-25.
Dreassi, Emanuela; Pratelli, Luca; Rigo, Pietro
File in questo prodotto:
Eventuali allegati, non sono esposti

I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.

Utilizza questo identificativo per citare o creare un link a questo documento: https://hdl.handle.net/11585/1031009
 Attenzione

Attenzione! I dati visualizzati non sono stati sottoposti a validazione da parte dell'ateneo

Citazioni
  • ???jsp.display-item.citation.pmc??? ND
  • Scopus ND
  • ???jsp.display-item.citation.isi??? ND
social impact