Skip to main content Accessibility help
×
Home

A practical parameterised algorithm for the individual haplotyping problem MLF

  • MINZHU XIE (a1), JIANXIN WANG (a2) and JIANER CHEN (a3)

Abstract

Haplotypes are more useful in complex disease gene mapping than single-nucleotide polymorphisms (SNPs). However, haplotypes are difficult to obtain directly using biological experiments, which has prompted research into efficient computational methods for determining haplotypes. The individual haplotyping problem called Minimum Letter Flip (MLF) is a computational problem that, given a set of aligned DNA sequence fragment data of an individual, induces the corresponding haplotypes by flipping minimum SNPs. There has been no practical exact algorithm for solving the problem. Due to technical limits in DNA sequencing experiments, the maximum length of a fragment sequenced directly is about 1kb. In consequence, with a genome-average SNP density of 1.84 SNPs per 1 kb of DNA sequence, the maximum number k1 of SNP sites that a fragment covers is usually small. Moreover, in order to save time and money, the maximum number k2 of fragments that cover an SNP site is usually no more than 19. Building on these fragment data properties, the current paper introduces a new parameterised algorithm with running time O(nk22k2 + mlogm + mk1), where m is the number of fragments and n is the number of SNP sites. In practical biological applications, the algorithm solves the MLF problem efficiently even if m and n are large.

Copyright

References

Hide All
Bonizzoni, P., Vedova, G. D., Dondi, R. and Li, J. (2003) The haplotyping problem: an overview of computational models and solutions. J. Comp. Sci. Technol. 18 (6)675688.
Chen, C. T. L., Wang, J. C. and Cohen, B. A. (2007) The strength of selection on ultraconserved elements in the human genome. The American Journal of Human Genetics 80 (4)692704.
Chen, Z., Fu, B., Schweller, R., Yang, B., Zhao, Z. and Zhu, B. (2008) Linear time probabilistic algorithms for the singular haplotype reconstruction problem from SNP fragments. J. Comput. Biol. 15 (5)535546.
Gabriel, S. B. et al. (2002) The structure of haplotype blocks in the human genome. Science 296 (5576) 22252229.
Greenberg, H. J., Hart, W. E. and Lancia, G. (2004) Opportunities for combinatorial optimization in computational biology. INFORMS J. Comput. 16 (3)211231.
Hinds, D. A. et al. (2005) Whole-genome patterns of common DNA variation in three human populations. Science 307 (5712) 10721079.
Hüffner, F. (2005) Algorithm engineering for optimal graph bipartization. In: Proceedings of the 4th International Workshop on Efficient and Experimental Algorithms, WEA 2005. Springer-Verlag Lecture Notes in Computer Science 3503 240252.
Huson, D. H., Halpern, A. L., Lai, Z., Myers, E. W., Reinert, K. and Sutton, G. G (2001) Comparing assemblies using fragments and mate-pairs. In: Proceedings of the 1st Workshop on Algorithms in BioInformatics, WABI 2001. Springer-Verlag Lecture Notes in Computer Science 2149 294306.
IHC (The International HapMap Consortium) (2003) The international hapmap project. Nature 426 (6968)789796.
IHC (The International HapMap Consortium) (2005) A haplotype map of the human genome. Nature 437 (7063) 12991320.
IHGS (the International Human Genome Sequencing Consortium) (2001) Initial sequencing and analysis of the human genome. Nature 409 (6822)860921.
Lancia, G., Bafna, V., Istrail, S., Lippert, R. and Schwartz, R. (2001) SNPs problems, complexity and algorithms. In: Proceedings of the 9th Annual European Symposium on Algorithms, ESA 2001. Springer-Verlag Lecture Notes in Computer Science 2161 182193.
Levy, S. et al. (2007) The diploid genome sequence of an individual human. PLoS Biology 5 (10)e254.
Lippert, R., Schwartz, R., Lancia, G. and Istrail, S. (2002) Algorithmic strategies for the single nucleotide polymorphism haplotype assembly problem. Brief. Bioinform 3 (1)19.
Myers, G. (1999) A dataset generator for whole genome shotgun sequencing. In: Proceedings of the 7th International Conference on Intelligent Systems for Molecular Biology, ISMB, AAAI Press 202210.
Panconesi, A. and Sozio, M. (2004) Fast hare: a fast heuristic for single individual SNP haplotype reconstruction. In: Proceedings of the 4th Workshop on Algorithms in BioInformatics, WABI 2004. Springer-Verlag Lecture Notes in Computer Science 3240 266277.
Tachmazidou, I., Verzilli, C. J. and Iorio, M. D. (2007) Genetic association mapping via evolution-based clustering of haplotypes. PLoS Genet. 3 (7)e111.
Venter, J. C. et al. (2001) The sequence of the human genome. Science 291 (5507)13041351.
Wang, R., Wu, L., Li, Z. and Zhang, X. (2005) Haplotype reconstruction from SNP fragments by minimum error correction. Bioinformatics 21 (10)24562462.
Wernicke, S. (2003) On the algorithmic tractability of single nucleotide polymorphism (SNP) analysis and related problems, Ph.D. Thesis, University Tübingen.
Xie, M., Chen, J. and Wang, J. (2007) Research on parameterized algorithms of the individual haplotyping problem. Journal of Bioinformatics and Computational Biology 5 (3)795816.
Xie, M. and Wang, J. (2008) An improved (and practical) parameterized algorithm for the individual haplotyping problem MFR with mate-pairs. Algorithmica 52 (2)250266.
Zhang, X., Wang, R., Wu, L. and Chen, L. (2006) Models and algorithms for haplotyping problem. Current Bioinformatics 1 (1)105114.
Zhao, Y., Wu, L., Zhang, J., Wang, R. and Zhang, X. (2005) Haplotype assembly from aligned weighted SNP fragments. Computational Biology and Chemistry 29 (4)281287.

Related content

Powered by UNSILO

A practical parameterised algorithm for the individual haplotyping problem MLF

  • MINZHU XIE (a1), JIANXIN WANG (a2) and JIANER CHEN (a3)

Metrics

Full text views

Total number of HTML views: 0
Total number of PDF views: 0 *
Loading metrics...

Abstract views

Total abstract views: 0 *
Loading metrics...

* Views captured on Cambridge Core between <date>. This data will be updated every 24 hours.

Usage data cannot currently be displayed.