Published online by Cambridge University Press: 21 February 2012
We study the situation in which a cheap measure (X) is observed in a large, representative twin sample, and a more expensive measure (Y) is observed in a selected subsample. The aim of this study is to investigate the optimal selection design in terms of the statistical power to detect genetic and environmental influences on the variance of Y and on the covariance of X and Y. Data were simulated for 4000 dizygotic and 2000 monozygotic twins. Missingness (87% vs. 97%) was then introduced in accordance with 7 selection designs: (i) concordant low + individual high design; (ii) extreme concordant design; (iii) extreme concordant and discordant design (EDAC); (iv) extreme discordant design; (v) individual score selection design; (vi) selection of an optimal number of MZ and DZ twins; and (vii) missing completely at random. The statistical power to detect the influence of additive and dominant genetic and shared environmental effects on the variance of Y and on the covariance between X and Y was investigated. The best selection design is the individual score selection design. The power to detect additive genetic effects is high irrespective of the percentage of missingness or selection design. The power to detect shared environmental effects is acceptable when the percentage of missingness is 87%, but is low when the percentage of missingness is 97%, except for the individual score selection design, in which the power remains acceptable. The power to detect D is low, irrespective of selection design or percentage of missingness. The individual score selection design is therefore the best design for detecting genetic and environmental influences on the variance of Y and on the covariance of X and Y. However, the EDAC design may be preferred when an additional purpose of a study is to detect quantitative trait loci effects.