Hostname: page-component-8448b6f56d-xtgtn Total loading time: 0 Render date: 2024-04-19T15:36:52.743Z Has data issue: false hasContentIssue false

Algebraic Methods for Studying Interactions Between Epidemiological Variables

Published online by Cambridge University Press:  06 June 2012

F. Ricceri
Affiliation:
Human Genetics Foundation, Turin, Italy Department of Genetics, Biology and Biochemistry, University of Turin, Italy
C. Fassino
Affiliation:
Department of Mathematics, University of Genoa, Italy
G. Matullo
Affiliation:
Human Genetics Foundation, Turin, Italy Department of Genetics, Biology and Biochemistry, University of Turin, Italy
M. Roggero
Affiliation:
Department of Mathematics, University of Turin, Italy
M.-L. Torrente
Affiliation:
Department of Mathematics, University of Genoa, Italy
P. Vineis
Affiliation:
Human Genetics Foundation, Turin, Italy Imperial College, London, UK
L. Terracini*
Affiliation:
Department of Mathematics, University of Turin, Italy
*
Corresponding author. E-mail: lea.terracini@unito.it
Get access

Abstract

Background

Independence models among variables is one of the most relevant topics in epidemiology, particularly in molecular epidemiology for the study of gene-gene and gene-environment interactions. They have been studied using three main kinds of analysis: regression analysis, data mining approaches and Bayesian model selection. Recently, methods of algebraic statistics have been extensively used for applications to biology. In this paper we present a synthetic, but complete description of independence models in algebraic statistics and a new method of analyzing interactions, that is equivalent to the correction by Markov bases of the Fisher’s exact test.

Methods

We identified the suitable algebraic independence model for describing the dependence of two genetic variables from the occurrence of cancer and exploited the theory of toric varieties and Gröbner basis for developing an exact independence test based on the Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to the study of gene-gene interaction in Gen-Air, an European case-control study. We computed the p-value for each pair of genetic variables interacting with disease status and we compared our results with the standard asymptotic chi-square test.

Results

We found an association among COMT Val158Met, APE1 Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer (p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs ERCC2 Lys751Gln and RAD51 172 G > T (p-value 0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value: 0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value: 0.0036).

Conclusion

Taking advantage of results from theoretical and computational algebra, the method we propose was more selective than other methods in detecting new interactions, and nevertheless its results were consistent with previous epidemiological and functional findings. It also helped us in controlling the multiple comparison problem. In the light of our results, we believe that the epidemiologic study of interactions can benefit of algebraic methods based on properties of toric varieties and Gröbner bases.

Type
Research Article
Copyright
© EDP Sciences, 2012

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Agresti, A., Exact inference for categorical data : Recent advances and continuing controversies, Statist. Med. 20 (2001), 27092722. CrossRefGoogle ScholarPubMed
A. Agresti, Categorical data analysis, Wiley, 2002.
Aurtrup, H., Genetic polymorphisms in human xenobiotica metabolizing enzymes as susceptibility factors in toxic response, Mutat Res 464 (2000), 6576. CrossRefGoogle Scholar
Beerenwinkel, N., Pachter, L., Sturmfels, B., Elena, S.F., Lenski, R.E., Analysis of epistatic interactions and fitness landscapes using a new geometric approach., BMC Evol Biol. 13 (2007), 7 :60. CrossRefGoogle Scholar
Cleary, S.P., Cotterchio, M., Shi, E., Gallinger, S., Harper, P., Cigarette smoking, genetic variants in carcinogen-metabolizing enzymes, and colorectal cancer risk, Am. J. Epidemiol. 172 (2010), no. 9, 10001014. CrossRefGoogle ScholarPubMed
Cordell, H.J., Detecting gene-gene interactions that underlie human diseases, Nat Rev Genet, 10 (2009), 392404. CrossRefGoogle ScholarPubMed
D. Cox, J. Little, D. O’Shea, Ideals, varieties, and algorithms, Undergraduate Texts in Mathematics, vol. 60, Springer-Verlag, New York, 1992.
A.C. Davison, D.V. Hinkley, Bootstrap methods and their applications, Cambridge University Press, Cambridge, 1997.
Diaconis, P., Sturmfels, B., Algebraic algorithms for sampling from conditional distributions, Ann. Statist., 26 (1998), 363397. Google Scholar
Drton, M., Sullivant, S., Algebraic statistical model, Statist. Sinica., 17 (2007), 12731297. Google Scholar
Dudbridge, F., Gusnanto, A., Koeleman, B.P.C., Detecting multiple associations in genome-wide studies, Human Genomics, 2 (2006), 310317. CrossRefGoogle ScholarPubMed
Dudbridge, F., Koeleman, B.P.C., Efficient computation of signifcance levels for multiple associations in large studies of correlated data, including genomewide association studies, Am. J. Hum. Genet, 75 (2004), 424435. CrossRefGoogle Scholar
E.S. Edgington, Randomization tests (3rd ed.), Marcel Dekker, New York, 1995.
B. Efron, The jackknife, the bootstrap and other resampling plans, Society of Industrial and Applied Mathematics CBMS-NFS Monographs, vol. 38, Capital City Press, Philadelphia, 1982.
Fan, L., Fuss, J.O., Cheng, Q.J., Arvai, A.S., Hammel, M., Roberts, V.A., Cooper, P.K., Tainer, J.A., XPD helicase structures and activities : insights into the cancer and aging phenotypes from xpd mutations., Cell, 133 (2008), 789800. CrossRefGoogle ScholarPubMed
C. Fassino, M.L. Torrente, Simple approximate varieties for sets of empirical points, Submitted. Available at http://arxiv.org/abs/1008.0274
I.O. Filiz, X. Guo, J. Morton, B. Sturmfels, Graphical models for correlated defaults, Available at http://arxiv.org/pdf/0809.1393v1.pdf, 2008.
R.A. Fisher, The design of experiments, Oliver and Boyd, Edinburgh, 1935.
W. Fulton, Introduction to toric varieties, Princeton University Press, 1993.
P. Good, Resampling methods : A practical guide to data analysis (3rd edition), Birchäuser, Boston, 2006.
Gorji, H., Shahbazi, N, Habibollahi, P., Tavangar, S.M., Firooz, A., Ghahremani, M.H., The glutathione-S-transferase P1 polymorphisms correlates with changes in expression of TP53 tumor suppressor in cutaneous basal cell carcinoma, Dermatol Sci 56 (2009), 20810. CrossRefGoogle ScholarPubMed
Hahn, L.W., Ritchie, M.D., Moore, J.H., Multifactor dimensionality reduction software for detecting gene-gene and gene-environment interactions, Bioinformatics, 19 (2003), 376382. CrossRefGoogle ScholarPubMed
Hallgrimsdottir, I., Sturmfels, B, Resultants in genetic linkage analysis, Journal of Symbolic Computation, 41 (2006), 125137. CrossRefGoogle Scholar
Lin, D.Y., An efficient monte carlo approach to assessing statistical significance in genomic studies, Bioinformatics, 21 (2005), 781787. CrossRefGoogle ScholarPubMed
Lo, H.W., Stephenson, L., Cao, X., Milas, M., Pollock, R., Ali-Osman, F., Identification and functional characterization of the human glutathione S-transferaseP1 gene as a novel transcriptional target of the p53 tumor suppressor gene., Mol Cancer Res, 6 (2008), 84350. CrossRefGoogle Scholar
Malaspinas, A.S., Uhler, C., Detecting epistases via markov bases, Journal of Algebraic Statistics, 2 (2011), no. 1, 3653. CrossRefGoogle Scholar
Manuguerra, M., Matullo, G., Veglia, F., Autrup, H., Dunning, A.M., Garte, S., Gormally, E., Malaveille, C., Guarrera, S., Polidoro, S., Saletta, F., Peluso, M., Airoldi, L., Overvad, K., Raaschou-Nielsen, O., Clavel-Chapelon, F., Linseisen, J., Boeing, H., Trichopoulos, D., Kalandidi, A., Palli, D., Krogh, V., Tumino, R., Panico, S., Bueno-De Mesquita, H.B., Peeters, P.H., Lund, E., Pera, G., Martinez, C., Amiano, P., Barricarte, A., Tormo, M.J., Quiros, J.R., Berglund, G., Janzon, L., Jarvholm, B., Day, N.E., Allen, N.E., Saracci, R., Kaaks, R., Ferrari, P., Riboli, E., Vineis, P., Multi-factor dimensionality reduction applied to a large prospective investigation on gene-gene and gene-environment interactions, Carcinogenesis, 28(2) (2007), 41422. CrossRefGoogle ScholarPubMed
Martone, T., Vineis, P., Malaveille, C., Terracini, B., Impact of polymorphisms in xeno(endo)biotic metabolism on pattern and frequency of p53 mutations in bladder cancer., Mutat Res, 462 (2000), 3039. CrossRefGoogle ScholarPubMed
Matullo, G., Dunning, A.M., Guarrera, S., Baynes, C., Polidoro, S., Garte, S., Autrup, H., Malaveille, C., Peluso, M., Airoldi, L., Veglia, F., Gormally, E., Hoek, G., Krzyzanowski, M., Overvad, K., Raaschou-Nielsen, O., Clavel-Chapelon, F., Linseisen, J., Boeing, H., Trichopoulou, A., Palli, D., Krogh, V., Tumino, R., Panico, S., Bueno-De Mesquita, H.B., Peeters, P.H., Lund, E., Pera, G., Martinez, C., Dorronsoro, M., Barricarte, A., Tormo, M.J., Quiros, J.R., Day, N.E., Key, T.J., Saracci, R., Kaaks, R., Riboli, E., Vineis, P., DNA repair polymorphisms and cancer risk in non-smokers in a cohort study, Carcinogenesis, 27(5) (2006), 9971007. CrossRefGoogle Scholar
Meng, Y., Ma, Q., Yu, Y., Farrell, J., Farrer, L.A., Wilcox, M.A., Multifactor-dimensionality reduction versus family-based association tests in detecting susceptibility loci in discordant sib-pair studies., BMC Genet, 30(6) (2005), S146. CrossRefGoogle Scholar
Molitor, J., Papathomas, M., Jerrett, M, and Richardson, S., Bayesian profile regression with an application to the national survey of children’s health., Biostatistics, 11 (2010), 484498. CrossRefGoogle ScholarPubMed
D.S. Moore, G. McCabe, W. Duckworth, S. Sclove, Chapter 18 :bootstrap methods and permutation tests, The Practice of Business Statistics, W.H. Freeman, New York, 2003.
Pachter, L., Sturmfels, B., Parametric inference for biological sequence analysis, Proc Natl Acad Sci U S A, 101 (2004), 1613843. CrossRefGoogle ScholarPubMed
Pachter, L., Sturmfels, B., Tropical geometry of statistical models, Proc Natl Acad Sci U S A, 101 (2004), 161327. CrossRefGoogle ScholarPubMed
Papathomas, M., Molitor, J., Richardson, S., Riboli, E., Vineis, P., Examining the joint effect of multiple risk factors using exposure risk profiles : lung cancer in nonsmokers, Environ. Health Perspect, 119 (2011), 8491. CrossRefGoogle ScholarPubMed
L. Patchter, B. Sturmfels, Algebraic statistics for computational biology, Cambridge University Press, 2005.
Peluso, M., Hainaut, P., Airoldi, L., Autrup, H., Dunning, A., Garte, S., Gormally, E., Malaveille, C., Matullo, G., Munnia, A., Riboli, E., Vineis, P., Methodology of laboratory measurements in prospective studies on gene-environment interactions : the experience of GenAir, Mutat Res, 574 (2005), 92104. CrossRefGoogle ScholarPubMed
G. Pistone, E. Riccomagno, and H.P. Wynn, Algebraic statistics, Chapman and Hall/CRC, Boca Raton, 2001.
Rapallo, F., Algebraic Markov bases and MCMC for two-way contingency tables, Scandinavian Journal of Statistics, 30 (2003), 385397. CrossRefGoogle Scholar
Rapallo, F., Algebraic exact inference for rater agreement models, Statistical Methods & Applications, 14 (2005), 4566. CrossRefGoogle Scholar
Riboli, E., The european prospective investigation into cancer and nutrition (EPIC) : plans and progress., J. Nutr., 131 (2001), no. 1, 170175. Google ScholarPubMed
Rice, T.K., Schork, N.J., Rao, D.C., Methods for handling multiple testing, Advances in Genetics, 60 (2008), 293308. Google ScholarPubMed
Ritchie, M.D., Hahn, L.W., Roodi, N., Bailey, L.R., Dupont, W.D., Parl, F.F., Moore, J.H., Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer, Am. J. Hum. Genet., 69 (2001), no. 1, 13847. CrossRefGoogle ScholarPubMed
J.L. Simon, Resampling : The new statistics (2nd edition), http://bcs.whfreeman.com/pbs/, 1997.
B. Sturmfels, Gröbner bases and convex polytopes, American Mathematical Society, 1996.
B. Sturmfels, Solving systems of polynomial equations, American Mathematical Society, 2002.
B. Sturmfels, Algebra and geometry of statistical models, Tech. report, John von Neumann Lectures, TU München, 2003.
Sturmfels, B., Sullivant, S., Toric ideals of phylogenetic invariants, J Comput Biol, 12 (2005), 204228. CrossRefGoogle ScholarPubMed
Vineis, P., Airoldi, L., Veglia, F., Olgiati, L., Pastorelli, R., Autrup, H., Dunning, A., Garte, S., Gormally, E., Hainaut, P., Malaveille, C., Matullo, G., Peluso, M., Overvad, K., Tjonneland, A., Clavel-Chapelon, F., Boeing, H., Krogh, V., Palli, D., Panico, S., Tumino, R., Bueno-De Mesquita, B., Peeters, P., Berglund, G., Hallmans, G., Saracci, R., Riboli, E., Environmental tobacco smoke and risk of respiratory cancer and chronic obstructive pulmonary disease in former smokers and never smokers in the EPIC prospective study., BMJ 330 (2005), 277. CrossRefGoogle ScholarPubMed
S. Wang, W. Xiong, W. Ma, S. Chanock, W. Jedrychowski, R. Wu, F.P. Perera, Gene-environment interactions on growth trajectories, Genetic Epidemiology (2012), doi : 10.1002/gepi.21613.
Wood, R.D., Mammalian nucleotide excision repair proteins and interstrand crosslink repair, Environ Mol Mutagen, 51 (2010), 5206. Google ScholarPubMed
Zhang, Y., Liu, J.S., Bayesian inference of epistatic interactions in case-control studies., Nature Genet, 39 (2007), 11671173. CrossRefGoogle ScholarPubMed
Zhang, Y., Rohde, L.H., Wu, H., Involvement of nucleotide excision and mismatch repair mechanisms in double strand break repair, Curr Genomics, 10 (2009), 2508. CrossRefGoogle ScholarPubMed