An interval quantitative trait locus (QTL) mapping method for complex polygenic diseases (as binary traits) showing QTL by environment interactions (QEI) was developed for outbred populations on a within-family basis. The main objectives, within the above context, were to investigate selection of genetic models and to compare liability or generalized interval mapping (GIM) and linear regression interval mapping (RIM) methods. Two different genetic models were used: one with main QTL and QEI effects (QEI model) and the other with only a main QTL effect (QTL model). Over 30 types of binary disease data as well as six types of continuous data were simulated and analysed by RIM and GIM. Using table values for significance testing, results show that RIM had an increased false detection rate (FDR) for testing interactions which was attributable to scale effects on the binary scale. GIM did not suffer from a high FDR for testing interactions. The use of empirical thresholds, which effectively means higher thresholds for RIM for testing interactions, could repair this increased FDR for RIM, but such empirical thresholds would have to be derived for each case because the amount of FDR depends on the incidence on the binary scale. RIM still suffered from higher biases (15–100% over- or under-estimation of true values) and high standard errors in QTL variance and location estimates than GIM for QEI models. Hence GIM is recommended for disease QTL mapping with QEI. In the presence of QEI, the model including QEI has more power (20–80% increase) to detect the QTL when the average QTL effect is small (in a situation where the model with a main QTL only is not too powerful). Top-down model selection is proposed in which a full test for QEI is conducted first and then the model is subsequently simplified. Methods and results will be applicable to human, plant and animal QTL mapping experiments.