A robust test for X-chromosome genetic association accounting for X-chromosome inactivation and imprinting

Yu Zhang; Si-Qi Xu; Wei Liu; Wing Kam Fung; Ji-Yuan Zhou

doi:10.1017/S0016672320000026

A robust test for X-chromosome genetic association accounting for X-chromosome inactivation and imprinting

Published online by Cambridge University Press: 01 April 2020

Yu Zhang ,

Si-Qi Xu ,

Wei Liu ,

Wing Kam Fung and

Ji-Yuan Zhou

Show author details

Yu Zhang: Affiliation:
State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China
Si-Qi Xu: Affiliation:
State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Wei Liu: Affiliation:
State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China
Wing Kam Fung: Affiliation:
Department of Statistics and Actuarial Science, The University of Hong Kong, Hong Kong, China
Ji-Yuan Zhou*: Affiliation:
State Key Laboratory of Organ Failure Research, Ministry of Education, and Guangdong Provincial Key Laboratory of Tropical Disease Research, Department of Biostatistics, School of Public Health, Southern Medical University, Guangzhou, China
*: Author for correspondence: Professor Ji-Yuan Zhou, E-mail: zhoujiyuan5460@hotmail.com

Article contents

Abstract
Introduction
Materials and methods
Simulation study
Discussion
Footnotes
References

Rights & Permissions

Abstract

The X chromosome is known to play an important role in many sex-specific diseases. However, only a few single-nucleotide polymorphisms on the X chromosome have been found to be associated with diseases. Compared to the autosomes, conducting association tests on the X chromosome is more intractable due to the difference in the number of X chromosomes between females and males. On the other hand, X-chromosome inactivation takes place in female mammals, which is a phenomenon in which the expression of one copy of two X chromosomes in females is silenced in order to achieve the same gene expression level as that in males. In addition, imprinting effects may be related to certain diseases. Currently, there are some existing approaches taking X-chromosome inactivation into account when testing for associations on the X chromosome. However, none of them allows for imprinting effects. Therefore, in this paper, we propose a robust test, ZXCII, which accounts for both X-chromosome inactivation and imprinting effects without requiring specifying the genetic models in advance. Simulation studies are conducted in order to investigate the validity and performance of ZXCII under various scenarios of different parameter values. The simulation results show that ZXCII controls the type I error rate well when there is no association. Furthermore, with regards to power, ZXCII is robust in all of the situations considered and generally outperforms most of the existing methods in the presence of imprinting effects, especially under complete imprinting effects.

Keywords

association test imprinting effects inactivation X chromosome

Type: Research Paper
Information: Genetics Research , Volume 102 , 2020 , e2

DOI: https://doi.org/10.1017/S0016672320000026 [Opens in a new window]
Creative Commons: This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright: Copyright © The Author(s), 2020

1. Introduction

The X chromosome has been found to play an important role in many complex diseases (Ober et al., Reference Ober, Loisel and Gilad2008; Wise et al., Reference Wise, Gyi and Manolio2013). However, the development of methods for detecting associations with X-linked markers has lagged behind that for autosomal markers due to the complexity of the inheritance patterns of the X chromosome (Wise et al., Reference Wise, Gyi and Manolio2013; Schurz et al., Reference Schurz, Salie, Tromp, Hoal, Kinnear and Möller2019). One primary characteristic of the X chromosome in mammals is that females have two copies of the X chromosome while males only have one, which increases the difficulty of X-linked association studies (Clayton, Reference Clayton2009; Ziegler, Reference Ziegler2009; Loley et al., Reference Loley, Ziegler and König2011). In addition, the phenomenon of X-chromosome inactivation (XCI) in females may constitute a risk factor for diseases, which is defined as the expression silencing of one of the two copies of the X chromosome in females. Thus, the X-chromosome gene dosage in female XX cells equals that in male XY cells, namely dosage compensation (Chow et al., Reference Chow, Yen, Ziesche and Brown2005; Payer & Lee, Reference Payer and Lee2008; Pessia et al., Reference Pessia, Makino, Bailly-Bechet, McLysaght and Marais2012). As such, the genetic effect of homozygous females can be regarded as the same as that of hemizygous males under XCI. It has been reported that most of the genes on the X chromosome are subject to XCI, while only about 15% of X-linked genes escape from inactivation (XCI-E) (Carrel & Willard, Reference Carrel and Willard2005). Random X-chromosome inactivation (XCI-R) is the general process of XCI by which one of the two copies of the X chromosome in each cell is randomly inactivated. But the XCI patterns in some females may become skewed from that of the XCI-R in an age- and tissue-dependent manner, and the same allele can be inactivated in more than 75% of cells in some cases (Migeon, Reference Migeon1998; Minks et al., Reference Minks, Robinson and Brown2008; Starmer & Magnuson, Reference Starmer and Magnuson2009; Wang et al., Reference Wang, Yu and Shete2014), which is denoted by XCI-S for convenience.

At present, there are some association tests available for single-nucleotide polymorphisms (SNPs) on the X chromosome. Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) proposed six methods for testing associations on the X chromosome by combining the genetic effects in females and males. Among them, the allele-based tests Z _A and Z _mfA require the assumption of a Hardy–Weinberg equilibrium (HWE), while the genotype-based methods Z _C and Z _mfG are robust to departures from a HWE. Furthermore, note that all four methods mentioned above rely on the assumption that females and males have the same risk alleles. Thus, two other methods (Z˜_mfA and Z˜_mfG) were developed and are applicable to the situation in which females and males have different risk alleles. On the other hand, the six methods of Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) only consider the information on XCI-E and do not take account of XCI, which may lead to loss of power if XCI is present. Clayton (Reference Clayton2008) was the first to suggest that XCI should be considered in X-chromosome association studies. Clayton's methods (T _A and T _AD) are equivalent to the score tests of generalized linear models accounting for XCI-R and give the same codes for homozygous females and hemizygous males. When the allele frequencies of the same allele differ between the sexes, the test statistics $T_A^s$, $T_{AD}^s$ and S _A, stratified by sex, have been proposed by Loley et al. (Reference Loley, Ziegler and König2011) and König et al. (Reference König, Loley, Erdmann and Ziegler2014). In addition, a software toolset XWAS (Gao et al., Reference Gao, Chang and Biddanda2015) includes four tests (FM ₀₁, FM ₀₂, FM _F and FM _S) based on logistic regressions. However, those approaches only consider XCI-R and ignore XCI-S. In order to simultaneously incorporate three biological patterns on the X chromosome (XCI-E, XCI-R and XCI-S), Wang et al. (Reference Wang, Yu and Shete2014) developed a maximum likelihood ratio method. However, this method is time-consuming because it is a permutation-based procedure for obtaining an empirical P-value. Meanwhile, Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017) proposed a robust method (Xcat) based on a generalized genetic model with the approximate P-value being easily obtained. Recently, Wang et al. (Reference Wang, Xu, Wang, Fung and Zhou2019) proposed a robust test, Z _max, by taking account of different dosage compensation patterns, which requires neither the assumption of a HWE nor the specification of underlying genetic models.

Imprinting is an epigenetic phenomenon that results in the differential expression of paternal and maternal alleles (Falls et al., Reference Falls, Pulford, Wylie and Jirtle1999). Researchers have found evidence for the existence of imprinting effects on some diseases, such as Angelman, Beckwith–Wiedemann and Prader–Willi syndromes (Falls et al., Reference Falls, Pulford, Wylie and Jirtle1999; Dong et al., Reference Dong, Li and Geller2005; Ziegler & König, Reference Ziegler and König2006; Wallace et al., Reference Wallace, Smyth, Maisuria-Armer, Walker, Todd and Clayton2010). On the other hand, it is likely that imprinted genes on the X chromosome are crucial to some diseases, such as Turner's syndrome (Donnelly et al., Reference Donnelly, Wolpert and Menold2002; Loesch et al., Reference Loesch, Quang Minh and Wendy2005). For some sex-specific diseases, such as autism, alleles on the paternal chromosome seem to be preferentially expressed, which is likely to explain why females are always less susceptible than males (Skuse, Reference Skuse2000). Imprinting is generally detected through testing for parent-of-origin effects (Hager et al., Reference Hager, Cheverud and Wolf2008). Thus, we use the term ‘parent-of-origin effects’ instead of ‘imprinting effects’ in the following sections. However, there is no method available for taking parent-of-origin effects into account when conducting association tests on the X chromosome.

Therefore, in this paper, we propose a robust method, Z _XCII, which is an extension of Xcat to the generalized linear model simultaneously accounting for imprinting and three biological patterns (XCI-E, XCI-R and XCI-S) into X-chromosome association tests without the need to specify the genetic models on the X chromosome. We investigate the performance of the proposed method and compare it with several existing tests through extensive simulation studies. Simulation results show that the proposed method controls the size well under all of the scenarios considered when there is no association. Moreover, with regards to power, Z _XCII is robust in all of the situations considered and generally outperforms most of the existing methods in the presence of imprinting effects, especially under complete imprinting effects.

2. Materials and methods

For a candidate SNP on the X chromosome with the mutant allele A and the normal allele a, there are four ordered genotypes for female offspring: a/a, a/A, A/a and A/A, where the left (right) allele of the slash is paternal (maternal). To distinguish the parent of origin of the mutant allele A in heterozygous female offspring, the information on their parental genotypes is required. With regards to male offspring, there are only two kinds of genotypes, a and A, which are maternal. Thus, we do not need to collect their parental genotypes. Assume that G _f1 and G _f2 are the numbers of allele A on the paternal and maternal X chromosomes in female offspring, respectively, and G _m is the number of allele A on the X chromosome in male offspring. The values of G _f1, G _f2 and G _m for different genotypes in the offspring generation are shown in Table 1. The disease status of an individual (female or male) in the offspring generation is denoted by Y with 1 (0) representing being affected (unaffected). In this paper, an affected daughter together with her parents is called a case–parent trio and an unaffected daughter together with her parents is considered as a control–parent trio (Deng & Chen, Reference Deng and Chen2001; Li et al., Reference Li, Li and He2016). Table 2 gives the genotype counts for the female offspring, where n _f is the total number of daughter–parent trios consisting of r _f case–parent trios and s _f control–parent trios. The genotype counts for the male offspring are also listed in Table 2, where n _m is the total number of males including r _m cases and s _m controls. As such, there are n _r = r _f + r _m cases and n _s = s _f + s _m controls in total. Therefore, the sample size is N = n _r + n _s = n _f + n _m. Let ϕ _f0, ϕ _f01, ϕ _f10 and ϕ _f2 be the penetrances of genotypes a/a, a/A, A/a and A/A in female offspring, respectively, and let ϕ _m0 and ϕ _m1 be the penetrances of genotypes a and A in male offspring, respectively. To test the association between the disease status Y and the SNP under study, we make the following two assumptions, just like Xcat (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017): (1) in the presence of association between the disease and the SNP, the generalized genetic model is assumed to hold in female offspring with ordered penetrances, either increasing (ϕ _f0 ⩽ ϕ _f01, ϕ _f10 ⩽ ϕ _f2) or decreasing (ϕ _f0 ⩾ ϕ _f01, ϕ _f10 ⩾ ϕ _f2); and (2) the mutant allele in female offspring is the same as that in male offspring.

Table 1. Values of G _f1, G _f2 and G _m for different genotypes in the offspring generation.

Table 2. Genotype counts for the single-nucleotide polymorphism on the X chromosome stratified by sex in the offspring generation.

A logistic regression model is proposed to describe the association between the disease and the SNP in female offspring:

(1)$$\eqalign{{\rm Logit}\,\lpar Pr\lpar Y & = 1\vert G_{\,f1}\comma \;G_{\,f2}\comma \;{\bi X}_{\bi f}\rpar \rpar \cr & = \beta _{\,f0} + \beta _{\,f1}G_{\,f1} + \beta _{\,f2}G_{\,f2} + \beta _{\,f3}G_{\,f1}G_{\,f2} \cr & \quad+ {\bi b}_{\bi f}^{\bi T} {\bi X}_{\bi f}}\comma \;$$

where β _f0 is the intercept, β _f1, β _f2 and β _f3 are the respective regression coefficients for G _f1, G _f2 and the interaction term G _f1G _f2, X_f is a vector of covariates and b_f is a vector of the regression coefficients for X_f. The estimates of these coefficients can be obtained with the iteratively reweighted least squares method (Wood, Reference Wood2006) using the glm function in R language (http://www.r-project.org). The null hypothesis of no association between the disease and the SNP in female offspring is H _f0∶β _f1 = β _f2 = β _f3 = 0. If at least one of these equations is not satisfied, then the association exists, which indicates the alternative hypothesis (H _f1). Logit(Pr(Y = 1|G _f1, G _f2, X_f)) outcomes for different genotypes in female offspring are presented in the fourth column of Table 1. Thus, under H _f1, the parent-of-origin effects at the SNP locus can be expressed by:

(2)$$\displaystyle{{{\rm Logit}\,\lpar Pr\lpar Y = 1\vert a/A\comma \;{\bi X}_{\bi f}\rpar \rpar -{\rm Logit}\,\lpar Pr\lpar Y = 1\vert A/a\comma \;{\bi X}_{\bi f}\rpar \rpar } \over {{\rm Logit}\,\lpar Pr\lpar Y = 1\vert A/A\comma \;{\bi X}_{\bi f}\rpar \rpar - {\rm Logit}\,\lpar Pr\lpar Y = 1\vert a/a\comma \;{\bi X}_{\bi f}\rpar \rpar }} = \displaystyle{{\beta _{\,f2}-\beta _{\,f1}} \over {\beta _{\,f1} + \beta _{\,f2} + \beta _{\,f3}}}\comma \;$$

when X_f is fixed at the same level. For example, β _f1 = β _f2 represents no parent-of-origin effects, while β _f2 = β _f3 = 0 denotes complete maternal parent-of-origin effect and β _f1 = β _f3 = 0 indicates complete paternal parent-of-origin effect. Moreover, we can use

(3)$$\eqalign{\gamma &= 2 \times \displaystyle{{\matrix{\lsqb {\rm Logit}\lpar Pr\lpar Y = 1\vert a/A\comma \;{\bi X}_{\bi f}\rpar \rpar + {\rm Logit}\lpar Pr\lpar Y = 1\vert A/a\comma \;{\bi X}_{\bi f}\rpar \rpar\cr -2 \times {\rm Logit}\lpar Pr\lpar Y = 1\vert a/a\comma \;{\bi X}_{\bi f}\rpar \rpar \rsqb /2}} \over {{\rm Logit}\lpar Pr\lpar Y = 1\vert A/A\comma \;{\bi X}_{\bi f}\rpar \rpar -{\rm Logit}\lpar Pr\lpar Y = 1\vert a/a\comma \;{\bi X}_{\bi f}\rpar \rpar }} \cr & \quad\quad\quad\qquad= \displaystyle{{\beta _{\,f1} + \beta _{\,f2}} \over {\beta _{\,f1} + \beta _{\,f2} + \beta _{\,f3}}}\comma \;} $$

to measure the degree of inactivation under XCI in a similar way to Wang et al. (Reference Wang, Zhang and Wang2019). On the other hand, the difference between β _f3 and 0 can be interpreted as the deviation of the genetic model from the additive one under XCI-E. To be specific, Table 3 gives the explanations of the regression coefficients for several situations of XCI and XCI-E under no parent-of-origin effects (β _f1 = β _f2 = β). β _f1 = β _f2 = −β _f3 means XCI-S with γ = 2 representing 100% of the cells having the mutant allele active or a dominant model under XCI-E. β _f1 = β _f2 = β and $\beta _{f3} ={-}{2 \over 3}\beta$ stand for XCI-S with γ = 1.5, where 75% of the cells have the mutant allele active. β _f1 = β _f2 ≠ 0 and β _f3 = 0 correspond to XCI-R with γ = 1 or an additive model under XCI-E. β _f1 = β _f2 = β and β _f3 = 2β imply XCI-S with γ = 0.5, where 25% of the cells have the mutant allele active. β _f1 = β _f2 = 0 and β _f3 ≠ 0 indicate XCI-S with γ = 0 representing that 100% of the cells have the normal allele active or a recessive model under XCI-E. However, in the presence of parent-of-origin effects, the explanation of the regression coefficients is more complicated, since parent-of-origin effects may contribute to the XCI. For example, β _f1 = 0.5 and β _f2 = β _f3 = 0 are indicative of the complete maternal parent-of-origin effect, whereas γ is obtained to be 1 (suggesting XCI-R) in this case. Therefore, XCI-R may be also caused by the complete maternal parent-of-origin effect.

Table 3. Explanation of the regression coefficients under no parent-of-origin effects.

XCI = X-chromosome inactivation.

Recall that when the disease is associated with the SNP, the generalized genetic model with ordered penetrances is assumed to hold in female offspring. As such, we have

$${\rm Logit}\,\lpar Pr\lpar Y = 1\vert a/a\comma \;{\bi X}_{\bi f}\rpar \rpar \le {\rm Logit}\,\lpar Pr\lpar Y = 1\vert a/A\comma \;{\bi X}_{\bi f}\rpar \rpar \le {\rm Logit}\,\lpar Pr\lpar Y = 1\vert A/A\comma \;{\bi X}_{\bi f}\rpar \rpar$$

and

$${\rm Logit}\,\lpar Pr\lpar Y = 1\vert a/a\comma \;{\bi X}_{\bi f}\rpar \rpar \le {\rm Logit}\,\lpar Pr\lpar Y = 1\vert A/a\comma \;{\bi X}_{\bi f}\rpar \rpar \le {\rm Logit}\,\lpar Pr\lpar Y = 1\vert A/A\comma \;{\bi X}_{\bi f}\rpar \rpar$$

which are equivalent to 0 ⩽ β _f1 ⩽ β _f1 + β _f2 + β _f3 and 0 ⩽ β _f2 ⩽ β _f1 + β _f2 + β _f3, respectively, with at least one inequality being strict. Adding these two inequalities together, we get 0 ⩽ β _f1 + β _f2 ⩽ 2(β _f1 + β _f2 + β _f3) and thus β _f1 + β _f2 + 2β _f3 ⩾ 0. Therefore, the alternative hypothesis becomes H _f1∶β _f1 ⩾ 0, β _f2 ⩾ 0, β _f1 + β _f2 + 2β _f3 ⩾ 0, with at least one inequality being strict, which can be expressed in matrix form as follows:

(4)$${\bf C}{\bi \beta }_f = \left({\matrix{ 1 & 0 & 0 \cr 0 & 1 & 0 \cr 1 & 1 & 2 \cr } } \right)\left({\matrix{ {\beta_{\,f1}} \cr {\beta_{\,f2}} \cr {\beta_{\,f3}} \cr } } \right)\ge {\bi 0}\comma \;$$

where ${\bf C} = \left({\matrix{ 1 & 0 & 0 \cr 0 & 1 & 0 \cr 1 & 1 & 2 \cr } } \right)\comma \;$${\bi \beta }_f = \left({\matrix{ {\beta_{f1}} \cr {\beta_{f2}} \cr {\beta_{f3}} \cr } } \right)\comma \;$ and 0 is a vector with all of the elements being 0. To test for the association, we first consider the following test statistics:

(5)$${\bi Z} = \lpar {Z_1\comma \;Z_2\comma \;Z_3} \rpar ^T = \lpar {{\bf C}{\hat{{\bi I}}}^{{-}1}{\bf C}^T} \rpar ^{-{1 \over 2}}{\bf C}\hat{{\bi \beta }}_f\comma \;$$

where $\hat{\beta }_{f}=\left(\hat{\beta }_{f1}\comma\ \hat{\beta }_{f2}\comma\ \hat{\beta }_{f3}\right)^T$ with $\hat{\beta }_{f1}$, $\hat{\beta }_{f2}$ and $\hat{\beta }_{f3}$ being the maximum likelihood estimates of β _f1, β _f2 and β _f3, respectively. $\hat{\bi{I}}$ is the empirical Fisher's information matrix (Wood, Reference Wood2006).

Under the null hypothesis of no association, Z ₁, Z ₂ and Z ₃ are independent of one another and asymptotically have standard normal distributions. Note that ${\bf C}{\bi \beta }_f \ge {\bi 0}$ leads to Z ⩾ 0 under H _f1, and we thus only calculate the right-sided P-values for Z ₁, Z ₂ and Z ₃, respectively. Then, we combine them using the Fisher's method (Fisher, Reference Fisher1954). Thus, the test statistic for female offspring can be constructed as:

(6)$$Q_f^R ={-}2\ln \lpar {\Phi \lpar {-Z_1} \rpar \Phi \lpar {-Z_2} \rpar \Phi \lpar {-Z_3} \rpar } \rpar \comma \;$$

where Φ(⋅) is the cumulative distribution function of the standard normal distribution. Under the null hypothesis, $Q_f^R$ has an asymptotic χ ² distribution with degrees of freedom (df) being 6. As such, the P-value of $Q_f^R$ is $P_f^R = 1-\chi _6^2 \lpar {Q_f^R } \rpar$, where $\chi _6^2 \lpar{\cdot} \rpar$ is the cumulative distribution function of the χ ² distribution with df being 6.

For male offspring, we model the relationship between the disease and the SNP using a logistic regression as:

(7)$${\rm Logit}\lpar {Pr\lpar {Y = 1\vert G_m\comma \;{\bi X}_m} \rpar } \rpar = \beta _{m0} + \beta _mG_m + {\bi b}_m^T {\bi X}_m\comma \;$$

where β _m0 is the intercept, β _m is the regression coefficient for G_m, X_m is a vector of covariates and b_m is a vector of the regression coefficients for X_m. When there is no association between the disease and the SNP, the null hypothesis for male offspring is H _m0:β _m = 0. Then, the test statistic for male offspring is

(8)$$Z_m = {{{\hat{\beta }}_m} \over {S_{{\hat{\beta }}_m}}}\comma \;$$

where $\hat{\beta }_m$ is the maximum likelihood estimate of β _m and $S_{{\hat{\beta }}_m}$ is the standard error of $\hat{\beta }_m$. Z _m follows a standard normal distribution under H _m0. When there are no covariates, Eq. (8) can be simplified to

(9)$$Z_m = {{n_m^{1/2} \lpar r_ms_a-s_mr_a\rpar } \over {{\lpar n_an_Ar_ms_m\rpar }^{1/2}}}$$

as in Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007) and Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017).

For combining the test statistics of female and male offspring, we need to turn the P-value for female offspring ($P_f^R$) into a Z-score, which is $Z_f^R ={-}\Phi ^{{-}1}\lpar {P_f^R } \rpar$. Then, under the assumption that the mutant allele in female offspring is the same as that in male offspring, the combined test statistics Z ^R can be constructed as follows:

(10)$$Z^R = \sqrt {{{n_f} \over {n_f + n_m}}} Z_f^R + \sqrt {{{n_m} \over {n_f + n_m}}} Z_m\comma \;$$

where $Z_f^R$ and Z _m are weighted by their respective proportions of the sample size. Under the overall null hypothesis that there is no association between the disease and the SNP in both female and male offspring ($H_0\,\colon\, {\bf C}{\bi \beta }_f = {\bi 0}$ and β _m = 0), Z ^R is asymptotically distributed as N(0, 1). Since the mutant allele is assumed to be A, with the overall one-sided alternative hypothesis $H_1\,\colon\, {\bf C}{\bi \beta }_f \ge {\bi 0}$ (with at least one inequality being strict) or β _m > 0, we only need to calculate the right-sided P-value of Z ^R when the mutant allele is known in advance.

So far, we have only considered the situation when the mutant allele is A. When the mutant allele is a, the overall alternative hypothesis turns to be $H_1\,\colon\, {\bf C}{\bi \beta }_f \le {\bi 0}$ (with at least one inequality being strict) or β _m < 0. Therefore, the corresponding test statistic for female offspring is $Q_f^L ={-}2\ln \lpar {\Phi \lpar {Z_1} \rpar \Phi \lpar {Z_2} \rpar \Phi \lpar {Z_3} \rpar } \rpar$, which combines the left-sided P-values of Z ₁, Z ₂ and Z ₃, and the P-value of $Q_f^L$ is $P_f^L = 1-\chi _6^2 \lpar {Q_f^L } \rpar$. Again, we combine the transformed Z-score ($Z_f^L ={-}\Phi ^{{-}1}\lpar {P_f^L } \rpar$) for female offspring and Z _m for male offspring to obtain the overall test statistic as:

(11)$$Z^L = \sqrt {{{n_f} \over {n_f + n_m}}} Z_f^L + \sqrt {{{n_m} \over {n_f + n_m}}} \lpar {-Z_m} \rpar.$$

Z ^L is asymptotically distributed as N(0, 1) under the overall null hypothesis. With this H ₁, just like Z ^R, only the right-sided P-value of Z ^L is needed when the mutant allele is known to be a in advance.

However, we generally have no information on the mutant allele before conducting the association studies. In this case, we propose the test statistic as:

(12)$$Z_{XCII} = max\left( {Z^L,Z^R} \right).$$

Although Z ^L and Z ^R are obviously dependent on each other, note that the components of Z_t = (Z ₁, Z ₂, Z ₃, Z _m)^T are independent of each other, and the functions −Z ^L and Z ^R of Z_t are non-decreasing functions. Thus, the P-value of Z _XCII can be approximately bounded by

(13)$$2\xi -\xi ^2 \le Pr\,\left( {Z_{XCII} \gt z} \right) \le 2\xi,$$

where ξ = 1 − Φ(z) according to Owen (Reference Owen2009) and Esary et al. (Reference Esary, Proschan and Walkup1967). Therefore, we can simply get the approximated P-value of Z _XCII by 2ξ.

3. Simulation study

3.1. Settings

We conduct a simulation study to investigate the size and power of the proposed Z _XCII method and compare it with the existing ones. Notice that in Zheng et al. (Reference Zheng, Joo, Zhang and Geller2007), and $\tilde{Z}_{mfA} \hbox{ and }\tilde{Z}_{mfG}$ are less powerful than the other four test statistics (Z _A, Z _C, Z _mfA and Z _mfG) under the assumption that the mutant allele in females is the same as that in males. Thus, in this simulation study, Z˜_mfA and Z˜_mfG are excluded. $T_A^s$ and FM _S are also excluded because they are asymptotically equivalent to Z _C (Loley et al., Reference Loley, Ziegler and König2011) and Z _mfG (Zheng et al., Reference Zheng, Joo, Zhang and Geller2007; Gao et al., Reference Gao, Chang and Biddanda2015; Wang et al., Reference Wang, Xu, Wang, Fung and Zhou2019), respectively. On the other hand, the permutation-based method in Wang et al. (Reference Wang, Yu and Shete2014) is excluded due to the intensive computations involved. Finally, we choose 14 methods (Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C, Z _mfG, T _A, T _AD, $T_{AD}^s$, FM ₀₁, FM _F, Z _mfA and Z _A) for the comparison. The references for the selected methods are listed in Table S1.

Note that most of the methods we compare do not consider the covariates, such as Xcat, S _A, Z _C, Z _mfG, T _A, T _AD, Z _mfA and Z _A. Thus, we do not include any covariate for simplicity in this simulation study and directly generate the genotype counts in Table 2. Let $p_{_F }$ and $p_{_M }$ denote the frequencies of the mutant allele A for females and males in the parental generation, respectively. Under random mating, the genotype frequencies of a/a, a/A, A/a and A/A for female offspring are $g_{f0} = \lpar {1-p_{_M }} \rpar \lpar {1-p_{_F }} \rpar$, $g_{f01} = \lpar {1-p_{_M }} \rpar p_{_F }$, $g_{f10} = p_{_M }\lpar {1-p_{_F }} \rpar$ and $g_{f2} = p_{_M }p_{_F }$, respectively, and the genotype frequencies of a and A for male offspring are $g_{m0} = 1-p_{_F }$ and $g_{m1} = p_{_F }$, respectively. Note that if random mating holds in the parental generation, HWE holds in the offspring generation only under the assumption that the frequency of the same allele in females and that of males are equal (Puig et al., Reference Puig, Ginebra and Graffelman2017). On the other hand, we consider the situation where $p_{_F } = p_{_M } = p$ but HWE does not hold in the female offspring. The corresponding frequencies of the four genotypes are g _f0 = (1 − p)² + ρp(1 − p), g _f01 = (1 − ρ)p(1 − p), g _f10 = (1 − ρ)p(1 − p) and g _f2 = p ² + ρp(1 − p), respectively, when the inbreeding coefficient ρ ≠ 0. Furthermore, the genotype frequencies for male offspring are g _m0 = 1 − p and g _m1 = p, respectively.

Note that the relationships among the penetrances and the regression coefficients are ${{\phi _{f01}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f01}} \rpar \phi _{f0}}} = e^{\beta _{f2}}$, ${{\phi _{f10}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f10}} \rpar \phi _{f0}}} = e^{\beta _{f1}}$ and ${{\phi _{f2}\lpar {1-\phi_{f0}} \rpar } \over {\lpar {1-\phi_{f2}} \rpar \phi _{f0}}} = e^{\beta _{f1} + \beta _{f2} + \beta _{f3}}$ for a/A, A/a and A/A, respectively, for female offspring and ${{\phi _{m1}\lpar {1-\phi_{m0}} \rpar } \over {\lpar {1-\phi_{m1}} \rpar \phi _{m0}}} = e^{\beta _m}$ for male offspring. Thus, genotype counts for female offspring in Table 2 can be generated according to a quadrinomial distribution with probabilities $\lpar {{g_{f0}\phi _{f0}} \over {\phi _f}}$, ${{g_{f01}\phi _{f01}} \over {\phi _f}}$, ${{g_{f10}\phi _{f10}} \over {\phi _f}}$, ${{g_{f2}\phi _{f2}} \over {\phi _f}}\rpar$ for cases and $\lpar {{g_{f0}\lpar {1-\phi_{f0}} \rpar } \over {1-\phi _f}}$, ${{g_{f01}\lpar {1-\phi_{f01}} \rpar } \over {1-\phi _f}}$, ${{g_{f10}\lpar {1-\phi_{f10}} \rpar } \over {1-\phi _f}}$, ${{g_{f2}\lpar {1-\phi_{f2}} \rpar } \over {1-\phi _f}}\rpar$ for controls, where ϕ _f = g _f0ϕ _f0 + g _f01ϕ _f01 + g _f10ϕ _f10 + g _f2ϕ _f2 is the disease prevalence of females. Similarly, we can obtain genotype counts for male offspring through a binomial distribution with probabilities $\left({{{g_{m0}\phi_{m0}} \over {\phi_m}}\comma \;{{g_{m1}\phi_{m1}} \over {\phi_m}}} \right)$ for cases and $\left({{{g_{m0}\lpar {1-\phi_{m0}} \rpar } \over {1-\phi_m}}\comma \;{{g_{m1}\lpar {1-\phi_{m1}} \rpar } \over {1-\phi_m}}} \right)$ for controls, where ϕ _m = g _m0ϕ _m0 + g _m1ϕ _m1 is the disease prevalence of males.

We consider various simulation settings. $\lpar {p_{_F }\comma \;p_{_M }} \rpar$ is taken to be (0.15, 0.25), (0.20, 0.20), (0.25, 0.15), (0.25, 0.35), (0.30, 0.30) and (0.35, 0.25). Then, under random mating, the corresponding allele frequencies for females and males in the offspring generation are (0.20, 0.15), (0.20, 0.20), (0.20, 0.25), (0.30, 0.25), (0.30, 0.30) and (0.30, 0.35), respectively. When $p_{_F } = p_{_M } = p = 0.2$ and 0.3, we set ρ = −0.05 and ρ = 0.05 for simulating the departure from HWE. ϕ _f0 and ϕ _m0 are set to be 0.120. For simulating the size, let all of the other penetrances be 0.120. When XCI exists, we suppose ϕ _f2 = ϕ _m1 = 0.240. The values of γ under XCI with different values of ϕ _f01 and ϕ _f10 are shown in Table S2. To investigate the power, we first consider the situations where there are both XCI and parent-of-origin effects: (1) (ϕ _f01, ϕ _f10) = (0.120, 0.240) (XCI with γ = 1 and complete maternal parent-of-origin effect); (2) (ϕ _f01, ϕ _f10) = (0.192, 0.216) (XCI with γ = 1.499 and incomplete maternal parent-of-origin effect); (3) (ϕ _f01, ϕ _f10) = (0.144, 0.204) (XCI with γ = 1.001 and incomplete maternal parent-of-origin effect); (4) (ϕ _f01, ϕ _f10) = (0.132, 0.156) (XCI with γ = 0.492 and incomplete maternal parent-of-origin effect); (5) (ϕ _f01, ϕ _f10) = (0.240, 0.120) (XCI with γ = 1 and complete paternal parent-of-origin effect); (6) (ϕ _f01, ϕ _f10) = (0.216, 0.192) (XCI with γ = 1.499 and incomplete paternal parent-of-origin effect); (7) (ϕ _f01, ϕ _f10) = (0.204, 0.144) (XCI with γ = 1.001 and incomplete paternal parent-of-origin effect); and (8) (ϕ _f01, ϕ _f10) = (0.156, 0.132) (XCI with γ = 0.492 and incomplete paternal parent-of-origin effect). Next, we take account of the scenarios where XCI exists but there are no parent-of-origin effects with ϕ _f01 = ϕ _f10 = ϕ: (1) ϕ = 0.240 (XCI with γ = 2); (2) ϕ = 0.204 (XCI with γ = 1.503); (3) ϕ = 0.168 (XCI with γ = 0.935); (4) ϕ = 0.144 (XCI with γ = 0.500); and (5) ϕ = 0.120 (XCI with γ = 0). Furthermore, we consider the situation where there is neither XCI nor parent-of-origin effects, which is (ϕ _f01, ϕ _f10, ϕ _f2, ϕ _m1) = (0.180, 0.180, 0.240, 0.180). The sample size N for each replication is selected to be 1000, including n _r = 500 cases and n _s = 500 controls. To investigate the effect of sex ratio, we fix the sex ratio in the control group as s _f : s _m = 1:1, while it varies in the case group as r _f : r _m = 3:2, 1:1 and 2:3. We use the significance level α = 10⁻⁵, and the number of replications is fixed to be 10⁶ and 10⁴ for estimating the size and power, respectively. The definitions of these parameters and the detailed biological meanings of the situations we consider are provided in Tables S3 and S4, respectively.

3.2. Size

Table 4 gives the estimated sizes of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C, Z _mfG, T _A, T _AD, $T_{AD}^s$, FM ₀₁, FM _F, Z _mfA and Z _A under different simulation settings when random mating holds in the parental generation. From Table 4, we can see that Z _XCII, Z _max, Xcat, FM ₀₂, Z _C, $T_{AD}^s$, FM ₀₁, FM _F, Z _mfA and Z _A generally control the size well, except that some of them produce a slightly conservative size under some situations. The sizes of S _A and Z _mfG are inflated when $\lpar {p_{_F }\comma \;p_{_M }} \rpar = \lpar {0.35\comma \;0.25} \rpar$ and the sex ratio is 3 : 2, and they stay close to the nominal level 10⁻⁵ for all of the other situations. T _A and T _AD can have inflated size when $\lpar {p_{_F }\comma \;p_{_M }} \rpar$ is equal to (0.25, 0.15) and (0.35, 0.25), which may be caused by the different allele frequencies between females and males in the offspring generation. However, they have a well-controlled size under the other situations. Table S5 reports the estimated sizes of different methods when $p_{_F } = p_{_M } = p$ but HWE does not hold in female offspring. In addition, Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C, Z _mfG, T _A, T _AD, $T_{AD}^s$, FM ₀₁ and FM _F generally control the size well. Z _mfA and Z _A can have inflated size when ρ = 0.05 and p = 0.30 since the allele-based test relies on the assumption of HWE in females.

Table 4. Estimated size (× 10⁻⁵) under random mating at significance level α = 10⁻⁵ based on 10⁶ replicates.^a

^a Numbers that are outside of the 95% confidence interval (0.38 × 10⁻⁵, 1.62 × 10⁻⁵) are highlighted in bold.

3.3. Power

To clearly illustrate the power results, we show the estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG with relatively better performance in Figures 1–6 and Figures S1–S22, and those of T _A, T _AD, $T_{AD}^s$, FM ₀₁, FM _F, Z _mfA and Z _A with inflated size or lower powers are displayed in Figures S23–S50. Figure 1 gives the estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio under random mating when there is XCI with γ = 1 and complete maternal parent-of-origin effect. It is shown in Figure 1 that Z _XCII has the highest power among all seven methods. The powers of Z _max, FM ₀₂ and Z _mfG are similar to each other and are generally higher than those of Xcat, S _A and Z _C. On the other hand, the powers are influenced by the sex ratio. When the proportion of males in the case group gets larger (r _f:r _m changing from 3:2 to 2:3), the power of Z _XCII becomes smaller in Figure 1(a), while it remains nearly unchanged in the other subplots of Figure 1, and the powers of Z _max, Xcat, FM ₀₂, Z _C and Z _mfG are almost unchanged in Figure 1(a), while they are larger in the other subplots. However, with the number of males in the case group, S _A is less powerful. It is also found that all of the methods have higher powers with increasing allele frequency (comparing the first row with the second row). Figure 2 displays the corresponding estimated powers when there is XCI with γ = 1.001 and incomplete maternal parent-of-origin effect. From Figure 2, we can see that the powers of Z _XCII, Z _max, FM ₀₂ and Z _mfG are very close to each other, which are generally larger than those of Xcat, S _A and Z _C. Compared to Figure 1, the effect of the sex ratio on Z _XCII is greater as the power of Z _XCII increases with larger male proportion in the case group in the second and third columns of Figure 2.

Fig. 1. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 1 and complete maternal parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = ϕ _f01 = 0.120 and ϕ _f10 = ϕ _f2 = ϕ _m1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 2. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 1.001 and incomplete maternal parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = 0.120, ϕ _f01 = 0.144, ϕ _f10 = 0.204 and ϕ _f2 = ϕ _m1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 3. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 2 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = 0.120 and ϕ _f01 = ϕ _f10 = ϕ _f2 = ϕ _m1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 4. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 0.935 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = 0.120, ϕ _f01 = ϕ _f10 = 0.168 and ϕ _f2 = ϕ _m1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 5. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 0 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = ϕ _f01 = ϕ _f10 = 0.120 and ϕ _f2 = ϕ _m1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 6. Estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against sex ratio (r _f : r _m = 3:2, 1:1 and 2:3) under random mating when there is neither X-chromosome inactivation nor parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕ _f0 = ϕ _m0 = 0.120, ϕ _f01 = ϕ _f10 = ϕ _m1 = 0.180 and ϕ _f2 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

When there are XCI and no parent-of-origin effects under random mating, the estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG with γ = 2, 0.935 and 0 are shown in Figures 3–5, respectively. From Figure 3, Z _mfG has the highest power in the first row of Figure 3, while Z _XCII is the most powerful in the second row. In fact, the powers of Z _XCII, Z _max, Xcat and Z _mfG are very close to each other, which are larger than those of FM ₀₂ and Z _C. S _A has relatively good performance in the first row of Figure 3, while it performs worse in the second row. In Figure 4, we find that Z _XCII generally has higher power than Xcat, S _A and Z _C, although it has less power than Z _max, FM ₀₂ and Z _mfG. Xcat is always the most powerful in all of the subplots of Figure 5. In the first row of Figure 5, Z _XCII, Z _max, FM ₀₂ and Z _C have similar powers, which perform much better than S _A and Z _mfG. In the second row of Figure 5, Z _XCII is more powerful than the other five methods, except for Xcat. Furthermore, by comparing Figures 3–5, we find that the powers get larger with increasing γ-value. By comparing Figure 1 (complete maternal parent-of-origin effect), Figure 2 (incomplete maternal parent-of-origin effect) and Figure 4 (no parent-of-origin effects) with γ being fixed close to 1 (XCI-R), the power of Z _XCII becomes smaller and smaller. Figure 6 plots the estimated powers of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG against the sex ratio under random mating when there is neither XCI nor parent-of-origin effects. Z _XCII has similar power to Xcat and FM ₀₂ in most situations. Z _max, S _A and Z _mfG always outperform the other methods, while the power of Z _C is always the lowest among those methods. The relatively low power of Z _XCII is due to no XCI and no parent-of-origin effects.

The power results of Z _XCII, Z _max, Xcat, S _A, FM ₀₂, Z _C and Z _mfG with γ = 1.499 and 0.492 under random mating and incomplete maternal parent-of-origin effect are given in Figures S1 and S2, respectively. When there are no parent-of-origin effects, Figures S3 and S4 plot the estimated powers under XCI with γ = 1.503 and 0.500, respectively. The powers of these seven methods under random mating and paternal parent-of-origin effects are shown in Figures S5–S8. The results are similar to those under maternal parent-of-origin effects, except that the powers of Z _XCII seem to be more strongly affected by the difference between $p_{_F }$ and $p_{_M }$ under paternal parent-of-origin effects. For example, the difference in power between Figure S5(c) and Figure S5(a) is much larger than that between Figure 1(c) and Figure 1(a).

Figures S9–S22 present the powers under the simulation settings where $p_{_F } = p_{_M } = p$ but HWE does not hold in female offspring. The left column of each figure represents the powers when ρ = −0.05, while the right column denotes the powers when ρ = 0.05. When comparing the two columns of each figure with the middle column in the corresponding figure under random mating (ρ = 0), we find that the powers with ρ = −0.05, 0 and 0.05 have similar trends, while the powers slightly increase as ρ changes from –0.05 to 0.05. This is probably due to the increase of genotype frequency of A/A. Finally, Figures S23–S50 display the powers of the other seven methods (T _A, T _AD, $T_{AD}^s$, FM ₀₁, FM _F, Z _mfA and Z _A), which control the size less well or have relatively low powers.

4. Discussion

In this paper, we propose a robust test, Z _XCII, for testing associations between certain diseases and an X-linked SNP by simultaneously accounting for XCI and parent-of-origin effects. Our proposed method is an extension of Xcat for the situation where parent-of-origin effects have influence on the process of XCI. Two reasonable assumptions are made for Z _XCII, just like Xcat (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017): the generalized genetic model is hypothesized for female offspring and the mutant allele in female offspring is the same as that in male offspring. A good feature of the proposed method that should be emphasized is that there is no need to specify the patterns of XCI or parent-of-origin effects. The simulation studies are conducted in order to investigate the validity and performance of Z _XCII under various scenarios of parameter values. The simulation results demonstrate that Z _XCII is robust in all of the situations considered. It controls the size well and generally outperforms most of the 13 existing methods in power in the presence of parent-of-origin effects, especially complete parent-of-origin effects, although it suffers from slight loss in power when there are no parent-of-origin effects. Thus, the proposed method is a preferred choice when we are not sure whether or not there are parent-of-origin effects in practice.

It should be noted that Z _XCII is an extension of Xcat. We first use the Fisher's method to combine Z ₁, Z ₂ and Z ₃ in female offspring (denoted by Z _f) and then obtain the proposed Z _XCII by weighting Z _f in female offspring and Z _m in male offspring, while Xcat applies the Fisher's method directly to incorporate the test statistics for females and males (Chen et al., Reference Chen, Ng, Li, Liu and Huang2017). In fact, we have used the other methods to directly combine the test statistics for females and males, such as Fisher's approach used in Chen et al. (Reference Chen, Ng, Li, Liu and Huang2017) and Stouffer's method (Owen, Reference Owen2009). However, we find that Z _XCII is optimal for most of the situations considered. On the other hand, compared to Xcat, the regression-based method allows us to adjust for covariates, which is another potential advantage of the proposed method. According to the simulation results (omitted here for brevity), we also found that Z _XCII and other methods are not applicable to the association study for rare alleles. We may need to use the SKAT (Wu et al., Reference Wu, Lee, Cai, Li, Boehnke and Lin2011) or the extensions of SKAT (Larson et al., Reference Larson, Chen and Schaid2019) for dealing with this situation, which will be our subsequent work. In addition, note that the proposed Z _XCII is only suitable for qualitative traits. If we want to analyse quantitative traits in future, we will need to change the logistic regression to multiple linear regression and conduct simulations to compare it with existing methods for quantitative traits. Finally, just like Wang et al. (Reference Wang, Yu and Shete2014), in order to simplify our model, we assumed that XCI-E is regarded as a binary variable to distinguish whether or not XCI is present. However, many genes have been observed to be of ‘variable escape’, with the levels of escape varying between individuals, cells and tissues or over time. How to consider these variable levels of XCI-E in our model will be our future work.

Supplementary material

For supplementary material accompanying this paper visit https://doi.org/10.1017/S0016672320000026.

Author contributions

Yu Zhang helped design the study, drafted the article and conducted the simulation study. Si-Qi Xu helped design the study and drafted the article. Wei Liu revised the article critically. Wing Kam Fung reviewed the whole paper and revised the article. Ji-Yuan Zhou helped design the study, supervised the field activities and directed their implementation, including quality assurance and control. All authors read and approved this version of the manuscript.

Acknowledgements

The authors thank the two reviewers for their helpful comments that greatly improve the presentation of this paper.

Financial support

This work was supported by the National Natural Science Foundation of China (grant number 81773544) and the Hong Kong RGC GRF grant (grant number 17302919).

Conflict of interest

None.

Ethical standards

None.

Footnotes

‡

Authors contributed equally

References

Carrel, L. and Willard, H.F. (2005). X-inactivation profile reveals extensive variability in X-linked gene expression in females. Nature 434(7031), 400–404.CrossRef Google Scholar PubMed

Chen, Z., Ng, H.K.T., Li, J., Liu, Q. and Huang, H. (2017). Detecting associated single-nucleotide polymorphisms on the X chromosome in case control genome-wide association studies. Statistical Methods in Medical Research 26(2), 567–582.CrossRef Google Scholar PubMed

Chow, J.C., Yen, Z., Ziesche, S. M. and Brown, C.J. (2005). Silencing of the mammalian X chromosome. Annual Review of Genomics and Human Genetics 6, 69–92.CrossRef Google Scholar PubMed

Clayton, D. (2008). Testing for association on the X chromosome. Biostatistics 9(4), 593–600.CrossRef Google Scholar PubMed

Clayton, D.G. (2009). Sex chromosomes and genetic association studies. Genome Medicine 1(11), 110.CrossRef Google Scholar PubMed

Deng, H.W. and Chen, W.M. (2001). The power of the transmission disequilibrium test (TDT) with both case–parent and control–parent trios. Genetics Research 78(3), 289–302.CrossRef Google Scholar PubMed

Dong, C., Li, W.D., Geller, F. et al. (2005). Possible genomic imprinting of three human obesity-related genetic loci. American Journal of Human Genetics 76(3), 427–437.CrossRef Google Scholar PubMed

Donnelly, S.L., Wolpert, C., Menold, M.M. et al. (2002). Female with autistic disorder and monosomy X (Turner syndrome): parent-of-origin effect of the X chromosome. American Journal of Medical Genetics 96(3), 312–316.3.0.CO;2-8>CrossRef Google Scholar

Esary, J.D., Proschan, F. and Walkup, D.W. (1967). Association of random variables, with applications. Annals of Mathematical Statistics 38(5), 1466–1474.CrossRef Google Scholar

Falls, J.G., Pulford, D.J., Wylie, A.A. and Jirtle, R.L. (1999). Genomic imprinting: implications for human disease. American Journal of Pathology 154(3), 635–647.CrossRef Google Scholar PubMed

Fisher, R.A. (1954). Statistical Methods for Research Workers. 12th ed. Edinburgh, UK: Oliver and Boyd.Google Scholar

Gao, F., Chang, D., Biddanda, A. et al. (2015). XWAS: a software toolset for genetic data analysis and association studies of the X chromosome. Journal of Heredity 106(5), 666–671.CrossRef Google Scholar PubMed

Hager, R., Cheverud, J.M. and Wolf, J.B. (2008). Maternal effects as the cause of parent-of-origin effects that mimic genomic imprinting. Genetics 178(3), 1755–1762.CrossRef Google Scholar PubMed

König, I.R., Loley, C., Erdmann, J. and Ziegler, A. (2014). How to include chromosome X in your genome-wide association study. Genetic Epidemiology 38(2), 97–103.CrossRef Google Scholar PubMed

Larson, N.B., Chen, J. and Schaid, D.J. (2019). A review of kernel methods for genetic association studies. Genetic Epidemiology 43(2), 122–136.CrossRef Google Scholar PubMed

Li, M., Li, J., He, Z. et al. (2016). Testing allele transmission of an SNP set using a family based generalized genetic random field method. Genetic Epidemiology 40(4), 345–351.CrossRef Google Scholar PubMed

Loesch, D.Z., Quang Minh, B., Wendy, K. et al. (2005). Effect of Turner's syndrome and X-linked imprinting on cognitive status: analysis based on pedigree data. Brain Development 27(7), 494–503.CrossRef Google Scholar PubMed

Loley, C., Ziegler, A. and König, I.R. (2011). Association tests for X-chromosomal markers – a comparison of different test statistics. Human Heredity 71(1), 23–36.CrossRef Google Scholar PubMed

Migeon, B. (1998). Non-random X chromosome inactivation in mammalian cells. Cytogenetics and Cell Genetics 80, 142–148.CrossRef Google Scholar PubMed

Minks, J., Robinson, W.P. and Brown, C.J. (2008). A skewed view of X chromosome inactivation. Journal of Clinical Investigation 118(1), 20–23.CrossRef Google Scholar PubMed

Ober, C., Loisel, D.A. and Gilad, Y. (2008). Sex-specific genetic architecture of human disease. Nature Reviews Genetics 9(12), 911–922.CrossRef Google Scholar PubMed

Owen, A.B. (2009). Karl Pearson's meta-analysis revisited. Annals of Statistics 37(6B), 3867–3892.CrossRef Google Scholar

Payer, B. and Lee, J.T. (2008). X chromosome dosage compensation: how mammals keep the balance. Annual Review of Genetics 42(1), 733–772.CrossRef Google Scholar PubMed

Pessia, E., Makino, T., Bailly-Bechet, M., McLysaght, A. and Marais, G.A.B. (2012). Mammalian X chromosome inactivation evolved as a dosage-compensation mechanism for dosage-sensitive genes on the X chromosome. Proceedings of the National Academy of Sciences of the United States of America 109(14), 5346–5351.CrossRef Google Scholar PubMed

Puig, X., Ginebra, J. and Graffelman, J. (2017). A Bayesian test for Hardy–Weinberg equilibrium of biallelic X-chromosomal markers. Heredity 119(4), 226–236.CrossRef Google Scholar PubMed

Schurz, H., Salie, M., Tromp, G., Hoal, E.G., Kinnear, C.J. and Möller, M. (2019). The X chromosome and sex-specific effects in infectious disease susceptibility. Human Genomics 13(1), 2.CrossRef Google Scholar PubMed

Skuse, D.H. (2000). Imprinting, the X-chromosome, and the male brain: explaining sex differences in the liability to autism. Pediatric Research 47(1), 9–16.CrossRef Google Scholar PubMed

Starmer, J. and Magnuson, T. (2009). A new model for random X chromosome inactivation. Development 136(1), 1–10.CrossRef Google Scholar PubMed

Wallace, C., Smyth, D.J., Maisuria-Armer, M., Walker, N.M., Todd, J.A. and Clayton, D. G. (2010). The imprinted DLK1-MEG3 gene region on chromosome 14q32.2 alters susceptibility to type 1 diabetes. Nature Genetics 42(1), 68–71.CrossRef Google Scholar PubMed

Wang, J., Yu, R. and Shete, S. (2014). X-chromosome genetic association test accounting for X-inactivation, skewed X-inactivation, and escape from X-inactivation. Genetic Epidemiology 38(6), 483–493.CrossRef Google Scholar PubMed

Wang, P., Xu, S.Q., Wang, B.Q., Fung, W.K. and Zhou, J.Y. (2019). A robust and powerful test for case–control genetic association study on X chromosome. Statistical Methods in Medical Research 28(10–11), 3260–3272.CrossRef Google Scholar PubMed

Wang, P., Zhang, Y., Wang, B.Q. et al. (2019). A statistical measure for the skewness of X chromosome inactivation based on case–control design. BMC Bioinformatics 20(1), 11.CrossRef Google Scholar PubMed

Wise, A.L., Gyi, L. and Manolio, T.A. (2013). eXclusion: toward integrating the X chromosome in genome-wide association analyses. American Journal of Human Genetics 92(5), 643–647.CrossRef Google Scholar PubMed

Wood, S.N. (2006). Generalized Additive Models: An Introduction with R. 1st ed. London, UK: Chapman & Hall Ltd.CrossRef Google Scholar

Wu, M.C., Lee, S., Cai, T., Li, Y., Boehnke, M. and Lin, X. (2011). Rare-variant association testing for sequencing data with the sequence kernel association test. American Journal of Human Genetics 89(1), 82–93.CrossRef Google Scholar PubMed

Zheng, G., Joo, J., Zhang, C. and Geller, N.L. (2007). Testing association for markers on the X chromosome. Genetic Epidemiology 31(8), 834–843.CrossRef Google Scholar PubMed

Ziegler, A. (2009). Genome-wide association studies: quality control and population-based measures. Genetic Epidemiology 33(S1), S45–S50.CrossRef Google Scholar PubMed

Ziegler, A. and König, I.R. (2006). A Statistical Approach to Genetic Epidemiology: Concepts and Applications. 1st ed. Weinheim, Germany: Wiley-VCH.Google Scholar

Table 1. Values of Gf1, Gf2 and Gm for different genotypes in the offspring generation.

Table 2. Genotype counts for the single-nucleotide polymorphism on the X chromosome stratified by sex in the offspring generation.

Table 3. Explanation of the regression coefficients under no parent-of-origin effects.

Table 4. Estimated size (× 10−5) under random mating at significance level α = 10−5 based on 106 replicates.a

Fig. 1. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 1 and complete maternal parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = ϕf01 = 0.120 and ϕf10 = ϕf2 = ϕm1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 2. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 1.001 and incomplete maternal parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = 0.120, ϕf01 = 0.144, ϕf10 = 0.204 and ϕf2 = ϕm1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 3. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 2 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = 0.120 and ϕf01 = ϕf10 = ϕf2 = ϕm1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 4. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 0.935 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = 0.120, ϕf01 = ϕf10 = 0.168 and ϕf2 = ϕm1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 5. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is X-chromosome inactivation with γ = 0 and no parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = ϕf01 = ϕf10 = 0.120 and ϕf2 = ϕm1 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.

Fig. 6. Estimated powers of ZXCII, Zmax, Xcat, SA, FM02, ZC and ZmfG against sex ratio (rf : rm = 3:2, 1:1 and 2:3) under random mating when there is neither X-chromosome inactivation nor parent-of-origin effects. The simulation is based on 10,000 replicates with N = 1000, ϕf0 = ϕm0 = 0.120, ϕf01 = ϕf10 = ϕm1 = 0.180 and ϕf2 = 0.240. (a) $p_{_F } = 0.15$, $p_{_M } = 0.25$. (b) $p_{_F } = 0.20$, $p_{_M } = 0.20$. (c) $p_{_F } = 0.25$, $p_{_M } = 0.15$. (d) $p_{_F } = 0.25$, $p_{_M } = 0.35$. (e) $p_{_F } = 0.30$, $p_{_M } = 0.30$. (f) $p_{_F } = 0.35$, $p_{_M } = 0.25$.