Background
Independence models among variables is one of the most relevant topics in epidemiology,
particularly in molecular epidemiology for the study of gene-gene and gene-environment
interactions. They have been studied using three main kinds of analysis: regression
analysis, data mining approaches and Bayesian model selection. Recently, methods of
algebraic statistics have been extensively used for applications to biology. In this paper
we present a synthetic, but complete description of independence models in algebraic
statistics and a new method of analyzing interactions, that is equivalent to the
correction by Markov bases of the Fisher’s exact test.
Methods
We identified the suitable algebraic independence model for describing the dependence of
two genetic variables from the occurrence of cancer and exploited the theory of toric
varieties and Gröbner basis for developing an exact independence test based on the
Diaconis-Sturmfels algorithm. We implemented it in a Maple routine and we applied it to
the study of gene-gene interaction in Gen-Air, an European case-control study. We computed
the p-value for each pair of genetic variables interacting with disease status and we
compared our results with the standard asymptotic chi-square test.
Results
We found an association among COMT Val158Met, APE1
Asp148Glu and bladder cancer (p-value: 0.009). We also found the interaction
among TP53 Arg72Pro, GSTP1 Ile105Val and lung cancer
(p-value: 0.00035). Leukaemia was observed to significantly interact with the pairs
ERCC2 Lys751Gln and RAD51 172 G > T (p-value
0.0072), ERCC2 Lys751Gln and LIG4Thr9Ile (p-value:
0.0095) and APE1 Asp148Glu and GSTP1 Ala114Val (p-value:
0.0036).
Conclusion
Taking advantage of results from theoretical and computational algebra, the method we
propose was more selective than other methods in detecting new interactions, and
nevertheless its results were consistent with previous epidemiological and functional
findings. It also helped us in controlling the multiple comparison problem. In the light
of our results, we believe that the epidemiologic study of interactions can benefit of
algebraic methods based on properties of toric varieties and Gröbner bases.