Skip to main content Accessibility help
×
Hostname: page-component-77c89778f8-sh8wx Total loading time: 0 Render date: 2024-07-20T03:20:41.029Z Has data issue: false hasContentIssue false

References

Published online by Cambridge University Press:  05 June 2012

William H. Majoros
Affiliation:
Duke University, North Carolina
Get access

Summary

Image of the first page of this content. For PDF version, please use the ‘Save PDF’ preceeding this image.'
Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2007

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Aha, D. W. and Bankert, R. L. (1996) A comparative evaluation of sequential feature selection algorithms. In Fisher, D. and Lenz, H.-Z. (eds.) Learning from Data, pp. 199–206. New York: Springer.Google Scholar
Aho, A. V., Sethi, R., and Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley.Google Scholar
Allen, J. E. and Salzberg, S. L. (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21:3596–3603.CrossRefGoogle ScholarPubMed
Allen, J. E., Pertea, M., and Salzberg, S. L. (2004) Computational gene prediction using mutliple sources of evidence. Genome Research 14:142–148.CrossRefGoogle Scholar
Allen, J. E., Majoros, W. H., Pertea, M., and Salzberg, S. L. (2006) JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biology 7(Suppl. 1):S9.CrossRefGoogle ScholarPubMed
Alexandersson, M., Cawley, S., and Pachter, L. (2003) SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Research 13:496–502.CrossRefGoogle ScholarPubMed
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Anang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389–3402.CrossRefGoogle ScholarPubMed
Anton, H. (1987) Elementary Linear Algebra, 5th edn. New York: John Wiley.Google Scholar
Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D. R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M., Servant, F., Sigrist, C. J. A., and Zdobnov, E. M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29:37–40.CrossRefGoogle ScholarPubMed
Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., and Zygouri, C. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Research 31:400–402.CrossRefGoogle ScholarPubMed
Azad, R. K. and Borodovsky, M. (2004) Effects of choice of DNA sequence model structure on gene identification accuracy. Bioinformatics 20: 993–1005.CrossRefGoogle ScholarPubMed
Bafna, V. and Huson, D. H. (2001) The conserved exon method for gene finding. ISMB'2000, 8:3–12.Google Scholar
Bahl L. R., Brown P. F., de Souza P. V., and Mercer R. L. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 49–52.CrossRef
Bailey, T. L. and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21:51–83.CrossRefGoogle Scholar
Bairoch, A. and Apweiler, R. (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Research 24:21–25.CrossRefGoogle ScholarPubMed
Bajic, V. B., Seah, S. H., Chong, A., Zhang, G., Koh, J. L. Y., and Brusic, V. (2002) Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18:198–199.CrossRefGoogle ScholarPubMed
Bajic, V. B., Tan, S. L., Suzuki, Y., and Sugano, S. (2004) Promoter prediction analysis on the whole human genome. Nature Biotechnology 22:1467–1473.CrossRefGoogle ScholarPubMed
Bajic, V. B., Brent, M. R., Brown, R. H., Frankish, A., Harrow, J., Ohler, U., Solovyev, V. V., and Tan, S. L. (2006) Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biology 7(Suppl. 1):S3.CrossRefGoogle ScholarPubMed
Baker, J. K. (1979) Trainable grammars for speech recognition. In Proceedings of the Spring Conference of the Acoustical Society of America, Boston, MA, pp. 547–550.Google Scholar
Bartel, D. P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297.CrossRefGoogle ScholarPubMed
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database. Nucleic Acids Research 32:D138–D141.CrossRefGoogle ScholarPubMed
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B., and Lander, ES. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Research 7:950–958.CrossRefGoogle Scholar
Baum, L. E. (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8.Google Scholar
Baum, L. E., Petrie, T., Goules, G., and Weiss, N. (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41:164–171.CrossRefGoogle Scholar
Beaudoing, E., Freier, S., Wyatt, J. R., Claverie, J.-M., and Gautheret, D. (2000) Patterns of variant polyadenylation signal usage in human genes. Genome Research 10:1001–1010.CrossRef
Benson, D. A., Karsch-Mizrachi I, , Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005) GenBank. Nucleic Acids Research 33:D34–D38.CrossRefGoogle ScholarPubMed
Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., and Rodier, F. (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958.CrossRefGoogle ScholarPubMed
Besemer, J., Lomsadze, A., and Borodovsky, M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes – implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29:2607–2618.CrossRefGoogle ScholarPubMed
Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X. M., Flicek, P., Gräf, S., Hammond, M., Herrero, J., Howe, K., Iyer, V., Jekosch, K., Kähäri, A., Kasprzyk, A., Keefe, D., Kokocinski, F., Kulesha, E., London, D., Longden, I., Melsopp, C., Meidl, P., Overduin, B., Parker, A., Proctor, G., Prlic, A., Rae, M., Rios, D., Redmond, S., Schuster, M., Sealy, I., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Stabenau, A., Stalker, J., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and Hubbard, T. J. P. (2006) Ensembl 2006. Nucleic Acids Research 34:D556–D561.CrossRefGoogle ScholarPubMed
Blanco, E., Parra, G., and Guigó, R. (2002). Using GENEID to identify genes. In Baxevanis, A. (ed.), Current Protocols in Bioinformatics, unit 4.3. New York: John Wiley.Google Scholar
Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST: database for expressed sequence tags. Nature Genetics 4:332–333.CrossRefGoogle ScholarPubMed
Borodovsky, M. and McIninch, J. (1993) GENMARK: parallel gene recognition for both DNA strands. Computers and Chemistry 16:37–43.Google Scholar
Borodovsky, M., Rudd, K. E., and Koonin, E. V. (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Research 22:4756–4767.CrossRefGoogle Scholar
Bouchard G. and Triggs B. (2004) The trade-off between generative and discriminative classifiers. In J. Antoch (ed.), Proceedings of International Symposinm on Computational Statistics (COMPSTAT) 2004, pp. 1–9.
Bray, N., Dubchak, I., and Pachter, L. (2003) AVID: a global alignment program. Genome Research 13:97–102.CrossRefGoogle ScholarPubMed
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees. Monterey, CA: Wadsworth International.
Brejová, B., Brown, D. G., Li, M., and Vinar, T. (2005) ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(Suppl. 1):i57–i65.CrossRefGoogle ScholarPubMed
Brendel, V. and Kleffe, J. (1998) Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. Nucleic Acids Research 26:4748–4757.CrossRefGoogle ScholarPubMed
Brown, R. H., Gross, S. S., and Brent, M. R. (2005) Begin at the beginning: predicting genes with 5′ UTRs. Genome Research 15:742–747.CrossRefGoogle ScholarPubMed
Bucher P. and Bairoch A. (1994) A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 53–61.
Buratti, E. and Baralle, F. E. (2004) Influence of RNA secondary structure on the pre-mRNA splicing process. Molecular Cell Biology 24:10505–10514.CrossRefGoogle ScholarPubMed
Burden, S., Lin, Y. X., and Zhang, R. (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics 21:601–607.CrossRefGoogle ScholarPubMed
Burge C. (1997) Identification of complete gene structures in human genomic DNA. Ph.D. thesis Stanford University, Stanford, CA.
Burge, C. (1998) Modeling dependencies in pre-mRNA splicing signals. In Salzberg, S., Searls, D., and Kasif, S. (eds.), Computational Methods in Molecular Biology, pp. 127–163. Amsterdam: Elsevier.Google Scholar
Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268:78–94.CrossRefGoogle ScholarPubMed
Burge, C. B., Tuschl, T., and Sharp, P. A. (1999) Splicing of precursors to mRNAs by the spliceosomes. In Gesteland, R. F., Cech, T. R., and Atkins, J. F. (eds.) The RNA World, 2nd edn., pp. 525–560. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.Google Scholar
Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2:121–167.CrossRefGoogle Scholar
Burks, C., Cassidy, M., Cinkosky, M. J., Cumella, K. E., Gilna, P., Hayden, J. E.-D., Keen, G. M., Kelley, T. A., Kelly, M., Kristofferson, D., and Ryals, J. (1991) GenBank. Nucleic Acids Research 19:S2221–S2225.CrossRefGoogle ScholarPubMed
Burset, M. and Guigó R, (1996) Evaluation of gene structure prediction programs. Genomics 34:357–367.CrossRefGoogle ScholarPubMed
Burset, M., Seledtsov, I. A., and Solovyev, V. V. (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Research 28:4364–4375.CrossRefGoogle ScholarPubMed
Cai, D., Delcher, A., Kao, B., and Kasif, S. (2000) Modeling splice sites with Bayes networks. Bioinformatics 16:152–158.CrossRefGoogle ScholarPubMed
Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C. A., Taylor, M. S., Engstrom, P. G., Frith, M. C., Forrest, A. R., Alkema, W. B., Tan, S. L., Plessy, C., Kodzius, R., Ravasi, T., Kasukawa, T., Fukuda, S., Kanamori-Katayama, M., Kitazume, Y., Kawaji, H., Kai, C., Nakamura, M., Konno, H., Nakano, K., Mottagui-Tabar, S., Arner, P., Chesi, A., Gustincich, S., Persichetti, F., Suzuki, H., Grimmond, S. M., Wells, C. A., Orlando, V., Wahlestedt, C., Liu, E. T., Harbers, M., Kawai, J., Bajic, V. B., Hume, D. A., and Hayashizaki, Y. (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genetics 38:626–635.CrossRefGoogle ScholarPubMed
Cawley, S. E., Wirth, A. I., and Speed, T. P. (2001) Phat: a gene finding program for Plasmodium falciparum. Molecular and Biochemical Parasitology 118:167–174.CrossRefGoogle ScholarPubMed
Choo, K. H., Tong, J. C., and Zhang, L. (2004) Recent applications of hidden Markov models in computational biology. Genomics, Proteomics, and Bioinformatics 2:84–96.CrossRefGoogle ScholarPubMed
Chou W., Juang B. H., and Lee C. H. (1992) Segmental GPD training of HMM based speech recognizer. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 473–476.CrossRef
Chow, C. K. and Liu, C. N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14:462–467.CrossRefGoogle Scholar
Chuang, T.-J., Lin, W. C., Lee, H. C., Wang, C. W., Hsiao, K. L., Wang, Z. H., Shieh, D., Lin, S. C., and Chang, L. Y. (2003) A complexity reduction algorithm for analysis and annotation of large genomic sequences. Genome Research 13:313–322.CrossRefGoogle ScholarPubMed
Chuang, T.-J., Chen, F.-C., and Chou, M.-Y. (2004) A comparative method for identification of gene structures and alternatively spliced variants. Bioinformatics 20:3064–3079.CrossRefGoogle ScholarPubMed
Clark, F. and Thanaraj, T. A. (2002) Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11:451–464.CrossRefGoogle ScholarPubMed
Clote, P. and Backofen, R. (2000) Computational Molecular Biology. New York: John Wiley.Google Scholar
Cocke, J. and Schwartz, J. T. (1970) Programming Languages and their Compilers: Preliminary Notes, Technical Report. New York: Courant Institute of Mathematical Sciences, New York University.
Cor´, D., Herrmann, C., Dieterich, C., Cunto, Di F., Provero, P., and Caselle, M. (2005) Ab initio identification of putative human transcription factor binding sites by comparative genomics. BMC Bioinformatics 6:110.CrossRefGoogle Scholar
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1992) Introduction to Algorithms. Cambridge, MA: MIT Press.Google Scholar
Cover, T. M. and Hart, P. E. (1967) Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13:57–67.CrossRefGoogle Scholar
Culotta, A., Kulp, D., and McCallum, A. (2005) Gene Prediction with Conditional Random Fields, Technical Report UM-CS-2005–028. Amherst, MA: University of Massachusetts.
Darwin, C. (1859) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray.Google Scholar
Davuluri, V. D., Grosse, I., and Zhang, M. Q. (2001) Computational identification of promoters and first exons in the human genome. Nature Genetics 29:412–417.CrossRefGoogle ScholarPubMed
Dawkins, R. (1982) The Extended Phenotype: The Long Reach of the Gene. Oxford: Oxford University Press.Google Scholar
Dawkins, R. (1997) Human chauvinism. Evolution 51:1015–1020.CrossRefGoogle Scholar
Dayhoff, M., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5:345–352.Google Scholar
Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., and Salzberg, S. L. (1999a) Alignment of whole genomes. Nucleic Acids Research 27:2369–2376.CrossRefGoogle Scholar
Delcher, A. L., Harmon, D., Kasif, S., White, O., and Salzberg, S. L. (1999b) Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27:4636–4641.CrossRefGoogle Scholar
Delcher, A. L., Phillippy, A., Carlton, J., and Salzberg, S. L. (2002) Fast algorithms for large-scale genome alignment and comparision. Nucleic Acids Research 30:2478–2483.CrossRefGoogle Scholar
Delphin, M. E., Stockwell, P. A., Tate, W. P., and Brown, C. M. (1999) Transterm, the translational signal database, extended to include full coding sequence and untranslated regions. Nucleic Acids Research 27:293–294.CrossRefGoogle Scholar
Dieterich, C., Grossmann, S., Tanzer, A., Röpcke, S., Arndt, P. F., Stadler, P. F., and Vingron, M. (2005) Comparative promoter region analysis powered by CORG. BMC Genomics 6:24.CrossRefGoogle ScholarPubMed
Ding, Y. (2006) Statistical and Bayesian approaches to RNA secondary structure prediction. RNA 12:232–331.Google ScholarPubMed
Doudna, J. A. and Cech, T. R. (2002) The chemical repertoire of natural ribozymes. Nature 418:222–228.CrossRefGoogle ScholarPubMed
Down, T. A. and Hubbard, T. J. P. (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Research 12:458–461.CrossRefGoogle ScholarPubMed
Dror, G., Sorek, R., and Shamir, R. (2004) Accurate identification of alternatively spliced exons using support vector machines. Bioinformatics 21:897–901.CrossRefGoogle Scholar
Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification, 2nd edn. New York: Wiley-Interscience.Google Scholar
Dunteman, G. H. (1989) Principal Components Analysis.London: Sage Publications.CrossRefGoogle Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis.Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Eddy, S. R. (2002) Computational genomics of noncoding RNA genes. Cell 109:137–140.CrossRefGoogle ScholarPubMed
Eddy, S. R. (2005) A model of the statistical power of comparative genome sequence analysis. PLoS Biology 3:e10.CrossRefGoogle ScholarPubMed
Eddy, S., Mitchison, G., and Durbin, R. (1995) Maximum discrimination hidden Markov models of sequence consensus. Journal of Computational Biology 2:9–23.CrossRefGoogle ScholarPubMed
Edwards, A. W. F. (1992) Likelihood.Baltimore, MD: Johns Hopkins University Press.Google Scholar
Eisen, J. A., Coyne, R. S., Wu, M., Wu, D., Thiagarajan, M., Wortman, J. R., Badger, J. H., Ren, Q., Amedeo, P., Jones, K. M., Tallon, L. J., Delcher, A. L., Salzberg, S. L., Silva, J. C., Haas, B. J., Majoros, W. H., Farzad, M., Carlton, J. M., Smith, R. K., Garg, J., Pearlman, R. E., Karrer, K. M., Sun, L., Manning, G., Elde, N. C., Turkewitz, A. P., Asai, D. J., Wilkes, D. E., Wang, Y., Cai, H., Collins, K., Stewart, B. A., Lee, S. R., Wilamowska, K., Weinberg, Z., Ruzzo, W. L., Wloga, D., Gaertig, J., Frankel, J., Tsao, C. C., Gorovsky, M. A., Keeling, P. J., Waller, R. F., Patron, N. J., Cherry, J. M., Stover, N. A., Krieger, C. J., Toro, Del C., Ryder, H. F., Williamson, S. C., Barbeau, R. A., Hamilton, E. P., and Orias, E. (2006) Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biology 29:4(9).Google Scholar
Fairbrother, W. G., Yeh, R. F., Sharp, P. A., and Burge, C. B. (2002) Predictive identification of exonic splicing enhancers in human genes. Science 297:1007–1013.CrossRefGoogle ScholarPubMed
Falconer, D. S. (1996) Introduction to Quantitative Genetics, 4th edn. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Fariselli, P., Martelli, P. L., and Casadio, R. (2005) The posterior-Viterbi: a new decoding algorithm for hidden Markov models. BMC Bioinformatics 6 (Suppl. 4):S12.CrossRefGoogle ScholarPubMed
Fausett, L. V. (1994) Fundamentals of Neural Networks. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Felsenstein, J. (1981) Evolutionary trees from DNA sequences. Journal of Molecular Evolution 17:368–376.CrossRefGoogle ScholarPubMed
Felsenstein, J. (1989) PHYLIP: phylogeny inference package (version 3.2). Cladistics 5:164–166.Google Scholar
Felsenstein, J. (2004) Inferring Phylogenies.Sunderland, MA: Sinauer Associates.Google Scholar
Felsenstein, J. and Churchill, G. A. (1996) A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13:93–104.CrossRefGoogle ScholarPubMed
Fischer, C. N. and LeBlanc, R. J. (1991) Crafting a Compiler with C. Menlo Park, CA: Benjamin/Cummings.Google Scholar
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7:179–188.CrossRefGoogle Scholar
Fletcher, R. (1980) Practical Methods of Optimization, vol. 1, Unconstrained Optimization.New York: John Wiley.Google Scholar
Florea, L., Hartzell, G., Zhang, Z., Rubin, G. M., and Miller, W. (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Research 8:967–974.CrossRefGoogle ScholarPubMed
Florea, L., Di Francesco, V., Miller, J., Turner, R., Yao, A., Harris, M., Walenz, B., Mobarry, C., Merkulov, G. V., Charlab, R., Dew, I., Deng, Z., Istrail, S., Li, P., and Sutton, G. (2005) Gene and alternative splicing annotation with AIR. Genome Research 15:54–66.CrossRefGoogle ScholarPubMed
Fogel, D. B. (2005) Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, 3rd edn. New York: Wiley-IEEE Press.CrossRefGoogle Scholar
Foissac, S. and Schiex, T. (2005) Integrating alternative splicing detection into gene prediction. BMC Bioinformatics 6:25.CrossRefGoogle ScholarPubMed
Freund Y. and Schapire R. E. (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory, pp. 23–37.CrossRef
Friedman, J. H. (1989) Regularized discriminant analysis. Journal of the American Statistical Association 84:165–175.CrossRefGoogle Scholar
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Booth, M., and Rossia, F. (2005) GNU Scientific Library Reference Manual, 2nd edn. Bristol, UK: Network Theory Ltd.Google Scholar
Gangal, R. and Sharma, P. (2005) Human pol II promoter prediction: time series descriptors and machine learning. Nucleic Acids Research 33:1332–1336.CrossRefGoogle ScholarPubMed
Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nature Genetics 25:25–29.CrossRef
Gierasch, L. M. (1989) Signal sequences. Biochemistry 28:923–930.CrossRefGoogle ScholarPubMed
Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.Google Scholar
Gould, S. J. (1994) The evolution of life on earth. Scientific American 271:62–69.CrossRefGoogle Scholar
Goutte C., Gaussier E., Cancedda N., and Dejean H. (2004) Generative vs. discriminative approaches to entity recognition from label-deficient data. In Féme Journées Internationales Analyse Statistique des Données Textuelles (JADT 2004), pp. 1–10.
Gropp, W., Lusk, E., and Skjellum, A. (1994) Using MPI: Portable Parallel Programming with the Message-Passing Interface. Cambridge, MA: MIT Press.Google Scholar
Gross, S. S. and Brent, M. R. (2005) Using multiple alignments to improve gene prediction. In Research in Computational Molecular Biology (RECOMB'05), pp. 374–388.Google Scholar
Guigó, R. (1998) Assembling genes from predicted exons in linear time with dynamic programming. Journal of Computational Biology 5:681–702.CrossRefGoogle ScholarPubMed
Guigó, R., Flicek, P., Abril, J. F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V. B., Birney, E., Castelo, R., Eyras, E., Gingeras, T. R., Harrow, J., Hubbard, T., Lewis, S., Ucla, C., and Reese, M. G. (2006) EGASP: the human ENCODE genome annotation assessment project. Genome Biology 7(Suppl. 1):S2.CrossRefGoogle ScholarPubMed
Haas, B. J., Delcher, A. L., Mount, S. M., Wortman, J. R., Smith, R. K., Hannick, L. I., Maiti, R., Ronning, C. M., Rusch, D. B., Town, C. D., Salzberg, S. L., and White, O. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31:5654–5666.CrossRefGoogle ScholarPubMed
Hall, K. B., Green, M. R., and Redfield, A. G. (1988) Structure of a pre-mRNA branch point / 3′ splice site region. Proceedings of the National Academy of Sciences of the USA 85:704–708.Google ScholarPubMed
Hannenhalli, S. and Levy, S. (2001) Promoter prediction in the human genome. Bioinformatics 17:S90–S96.CrossRefGoogle ScholarPubMed
Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160–174.CrossRefGoogle ScholarPubMed
Hastie, T., Tibshirani, R., and Friedman, J. H. (2003) The Elements of Statistical Learning. New York: Springer.Google Scholar
Heber, S., Alekseyev, M., Sze, S.-H., Tang, H., and Pevzner, P. A. (2002) Splicing graphs and EST assembly problem. Bioinformatics 18:S181–S188.CrossRefGoogle ScholarPubMed
Heckerman D., Geiger D., and Chickering D. (1994) Learning Bayesian networks: the combination of knowledge and statistical data. In Knowledge Discovery and Data Mining Workshop (KDD '94), pp. 85–96.CrossRef
Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in human DNA with a hidden Markov model. Journal of Computational Biology 4:127–141.CrossRefGoogle Scholar
Henikoff, J. G., Greene, E. A., Pietrokovski, S., and Henikoff, S. (2000) Increased coverage of protein families with the BLOCKS database servers. Nucleic Acids Research 28:228–230.CrossRefGoogle ScholarPubMed
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the USA 89:10915–10919.Google ScholarPubMed
Heylighen, F. (1999) The growth of structural and functional complexity during evolution. In Heylighen, F., Bollen, J., and Riegler, A. (eds.) The Evolution of Complexity, pp. 17–44. New York: Kluwer.Google Scholar
Hirschberg, D. (1975) A linear space algorithm for computing maximal common subexpressions. Communications of the Association of Computing Machinery 18:341–343.CrossRefGoogle Scholar
Hofacker, I. L., Fontana, W., Stadler, P. F., Bonhoeffer, L. S., Tacker, M., and Schuster, P. (1994) Fast folding and comparison of RNA secondary structures. Chemical Monthly 125:167–188.CrossRefGoogle Scholar
Hopcroft, J. E. and Ullman, J. D. (1979) Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley.Google Scholar
Hoskins, R. A., Smith, C. D., Carlson, J. W., Carvalho, A. B., Halpern, A., Kaminker, J. S., Kennedy, C., Mungall, C. J., Sullivan, B. A., Sutton, G. G., Yasuhara, J. C., Wakimoto, B. T., Myers, E. W., Celniker, S. E., Rubin, G. M., and Karpen, G. H. (2002) Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biology 3:0085.1–0085.16.CrossRefGoogle Scholar
Howe, K. L., Chothia, T., and Durbin, R. (2002) GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Research 12:1418–1427.CrossRefGoogle ScholarPubMed
Huang, X., Adams, M. D., Zhou, H., and Kerlavage, A. R. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46:35–45.CrossRefGoogle ScholarPubMed
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921.CrossRef
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945.CrossRef
Jaakkola, T. S. and Haussler, D. (1999) Exploiting generative models in discriminative classifiers. Advances in Neural Information Processing Systems 11:487–493.Google Scholar
Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985) Individual-specific “fingerprints” of human DNA. Nature 316:76–79.CrossRefGoogle ScholarPubMed
Jelinek, F. (1997) Statistical Methods for Speech Recognition. Bradford, MA: Bradford Books.Google Scholar
Jelinek F. and Mercer R. L. (1980) Interpolated estimation of Markov source parameters. In Proceedings of the Workshop on Pattern Recognition in Practice, May 1980,
Jensen, F. V. (2001) Bayesian Networks and Decision Graphs. New York: Springer.CrossRefGoogle Scholar
Jenuwein, T. and Allis, C. D. (2001) Translating the histone code. Science 293:1074–1080.CrossRefGoogle ScholarPubMed
Johansen F. T. (1996) A comparison of hybrid HMM architectures using global discriminative training. In Proceedings of the 4th International Conference on Spoken Language Processing, pp. 498–501.CrossRef
Jordan, M. I., Ghahramani, Z., T. S., Jaakkola, and Saul, L. K. (1999) An introduction to variational methods for graphical methods. In Jordan, M. I. (ed.) Learning in Graphical Models, pp. 105–162. Cambridge, MA: MIT Press.Google Scholar
Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. In Munro, H. N. (ed.) Mammalian Protein Metabolism, pp. 21–132. New York: Academic Press.Google Scholar
Käll, L., Krogh, A., and Sonnhammer, E. L. (2005) An HMM posterior decoden for sequence feature prediction that includes homology information. Bioinformatics 21 (Suppl. 1): i251–i257.CrossRefGoogle ScholarPubMed
Kamal M., Xie X., and Lander E. S. (2006) A large family of ancient repeat elements in the human genome is under strong selection. Proceedings of the National Academy of Sciences of the USA 103:2740–2745.CrossRef
Karlin S. and Altschul S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the USA 87:2264–8.CrossRef
Kasami, T. (1965). An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages, Scientific Report AFCRL–65–758. Bedford, MA: Air Force Cambridge Research Laboratory.Google Scholar
Katz, S. M. (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing 35: 400–401.CrossRefGoogle Scholar
Kaufman, L. (1998) Solving the quadratic programming problem arising in support vector classification. In Scholkopf, B., Burges, C. J. C., and Smola, A. J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 147–167. Cambridge, MA: MIT Press.Google Scholar
Keilson, J. (1979) Markov Chain Models: Rarity and Exponentiality.New York: Springer.CrossRefGoogle Scholar
Kent, W. J. (2002) BLAT: the BLAST-like alignment tool. Genome Research 12:656–664.CrossRefGoogle ScholarPubMed
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. (2002) The human genome browser at UCSC. Genome Research 12:996–1006.CrossRefGoogle ScholarPubMed
Kimura, M. (1980) A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111–120.CrossRefGoogle ScholarPubMed
Kingsbury, N. G. and Rayner, P. J. W. (1971) Digital filtering using logarithmic arithmetic. Electronic Letters 7:56–58.CrossRefGoogle Scholar
Kohavi R. and Sahami M. (1996) Error-based and entropy-based discretization of continuous features. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119.
Korf, I. (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59.CrossRefGoogle ScholarPubMed
Korf, I., Flicek, P., Duan, D., and Brent, M. R. (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140–S148.CrossRefGoogle ScholarPubMed
Korf, I., Yandell, M., and Bedell, J. (2003) BLAST. Sebastopol, CA: O'Reilly.Google ScholarPubMed
Krogh A. (1994) Hidden Markov models for labeled sequences. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 140–144.CrossRef
Krogh A. (1997) Two methods for improving performance of an HMM and their application for gene finding. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 179–186.
Krogh, A. (1998) An introduction to hidden Markov models for biological sequences. In Salzberg, S. L., Searls, D. B., and Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 45–62. Amsterdam: Elsevier.Google Scholar
Krogh A. (2000) Using database matches with HMMGene for automated gene detection in Drosophila. Genome Research10:523–528.CrossRef
Krogh, A., Mian, I. S., and Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Research 22:4768–4778.CrossRefGoogle ScholarPubMed
Koza, J. (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection.Cambridge, MA: MIT Press.Google Scholar
Kullback, S. (1997) Information Theory and Statistics. New York: Dover.Google Scholar
Kulkarni, O. C., Vigneshwar, R., Jayaraman, V. K., and Kulkarni, B. D. (2005) Identification of coding and non-coding sequences using local Hölder exponent formalism. Bioinformatics 21:3818–3823.CrossRefGoogle ScholarPubMed
Kulp, D. and Haussler, D. (1997) Integrating database homology in a probabilistic gene structure model. Pacific Symposium on Bioinformatics 2:232–244.Google Scholar
Kulp D., Haussler D., Reese M., and Eeckman F. (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, pp. 134–142.
Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S. L. (2004) Versatile and open software for comparing large genomes. Genome Biology 5:R12.1–R12.9.CrossRefGoogle ScholarPubMed
Lam, F., Alexandersson, M., and Pachter, L. (2003) Picking alignments from (Steiner) trees. Journal of Computational Biology 10:509–520.CrossRefGoogle ScholarPubMed
Lander, E. S. and Waterman, M. S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231–239.CrossRefGoogle ScholarPubMed
Landry, J. R., Mager, D. L., and Wilhelm, B. T. (2003) Complex controls: the role of alternative promoters in mammalian genomes. Trends in Genetics 19:640–648.CrossRefGoogle ScholarPubMed
Lapedes, A., Barnes, C., Burks, C., Farber, R., and Sirotkin, K. (1989) Application of neural networks and other machine learning algorithms to DNA sequence analysis. In Bell, G. and Marr, T. (eds.) Computers and DNA: SFI Studies in the Sciences of Complexity, pp. 157–182. Reading, MA: Addison-Wesley.Google Scholar
Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. (1992) CpG islands as gene markers in the human genome. Genomics 13:1095–1107.CrossRefGoogle ScholarPubMed
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214.CrossRefGoogle ScholarPubMed
Lee, C., Grasso, C., and Sharlow, M. F. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464.CrossRefGoogle ScholarPubMed
Lewin, B. (2003) Genes VIII. New York: Prentice-Hall.Google Scholar
Lian, Y. and Garner, H. R. (2005) Evidence for the regulation of alternative splicing via complementary DNA sequence repeats. Bioinformatics 8:1358–1364.CrossRefGoogle Scholar
Lim, L. P., and Burge, C. B. (2001) A computational analysis of sequence features involved in recognition of short introns. Proceedings of the National Academy of Sciences of the USA 98:11193–11198.CrossRefGoogle ScholarPubMed
Liò, P. and Goldman, N. (1998) Models of molecular evolution and phylogeny. Genome Research 8:1233–1244.CrossRefGoogle ScholarPubMed
Loots, G. G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E. M. (2002) rVista for comparative sequence-based discovery of functional transcriptional factor binding sites. Genome Research 12:832–839.CrossRefGoogle Scholar
Lowe, T. M. and Eddy, S. R. (1999) A computational screen for methylation guide snoRNAs in yeast. Science 283:1168–1171.CrossRefGoogle Scholar
Lukashin, A. V. and Borodovsky, M. (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 26:1107–1115.CrossRefGoogle ScholarPubMed
Mackey A. (2005) GLEAN: improved eukaryotic gene prediction by statistical consensus of gene evidence. Poster presented at Genome Informatics Conference, October 28, 2005.
Maglott, D. R., Katz, K. S., Sicotte, H., and Pruitt, K. D. (2000) NCBI's LocusLink and RefSeq. Nucleic Acids Research 28:126–128.CrossRefGoogle ScholarPubMed
Majoros, W. H. and Salzberg, S. L. (2004) An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinformatics 5:206.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., Antonescu, C., and Salzberg, S. L. (2003) GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Research 31:3601–3604.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., and Salzberg, S. (2004) TIGRscan and GlimmerHMM: two open source ab initio eukaryotic gene finders. Bioinformatics 20:2878–2879.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., Delcher, A. L., and Salzberg, S. L. (2005a) Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinformatics 6:16.CrossRefGoogle Scholar
Majoros, W. H., Pertea, M., and Salzberg, S. L. (2005b) Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics 21:1782–1788.CrossRefGoogle Scholar
Manly, B. F. J. (1994) Multivariate Statistical Methods: A Primer, 2nd edn. New York: Chapman and Hall.Google Scholar
Manning, C. and Schütze H, (1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
Marashi, S. A., Goodarzi, H., Sadeghi, M., Eslahchi, C., and Pezeshk, H. (2006) Importance of RNA secondary structure information for yeast donor and acceptor splice site predictions by neural networks. Computational Biology and Chemistry 30:50–57.CrossRefGoogle ScholarPubMed
Markov K., Nakagawa S., and Nakamura S. (2001) Discriminative training of HMM using maximum normalized likelihood algorithm. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 497–500.CrossRef
Matlin, A. J., Clark, F., and Smith, C. W. (2005) Understanding alternative splicing: towards a cellular code. Nature Reviews: Molecular Cell Biology 6:386–398.CrossRefGoogle ScholarPubMed
McAuliffe, J. D., Pachter, L., and Jordan, M. I. (2004) Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics 20:1850–1860.CrossRefGoogle ScholarPubMed
McCaskill, J. S. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119.CrossRefGoogle ScholarPubMed
Mealy, G. H. (1955) A method for synthesizing sequential circuits. Bell System Technical Journal 34:1045–1079.CrossRefGoogle Scholar
Meyer, I. M. and Durbin, R. (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318.CrossRefGoogle ScholarPubMed
Meyer, I. M. and Durbin, R. (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Research 32:776–783.CrossRefGoogle ScholarPubMed
Mitchell, T. (1997) Machine Learning. New York: McGraw-Hill.Google Scholar
Mitrophanov, A. Y. and Borodovsky, M. (2006) Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7:2–24.CrossRefGoogle ScholarPubMed
Mizrahi, A. and Sullivan, M. (1986) Calculus and Analytic Geometry, 2nd edn. Belmont, CA: Wadsworth.Google Scholar
Moore, E. F. (1956) Gedanken experiments on sequential machines. In Shannon, C. E. and McCarthy, J. (eds.) Automata Studies, pp. 129–153. Princeton, NJ: Princeton University Press.Google Scholar
Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.CrossRef
Mural, R. J., Adams, M. D., Myers, E. W., Smith, H. O., Miklos, G. L., Wides, R., Halpern, A., Li, P. W., Sutton, G. G., Nadeau, J., Salzberg, S. L., Holt, R. A., Kodira, C. D., Lu, F., Chen, L., Deng, Z., Evangelista, C. C., Gan, W., Heiman, T. J., Li, J., Li, Z., Merkulov, G. V., Milshina, N. V., Naik, A. K., Qi, R., Shue, B. C., Wang, A., Wang, J., Wang, X., Yan, X., Ye, J., Yooseph, S., Zhao, Q., Zheng, L., Zhu S. C., Biddick, K., Bolanos, R., Delcher, A. L.Dew, I. M., Fasulo, D.Flanigan, M. J., Huson, D. H., Kravitz, S. A., Miller, J. R., Mobarry, C. M., Reinert, K.Remington, K. A., Zhang, Q.Zheng, X. H., Nusskern, D. R., Lai, Z., Lei, Y., Zhong, W., Yao, A., Guan, P.Ji, R. R., Gu, Z.Wang, Z. Y., Zhong, F., Xiao, C.Chiang, C. C., Yandell, M.Wortman, J. R., Amanatides, P. G., Hladun, S. L., Pratts, E. C., Johnson, J. E., Dodson, K. L., Woodford, K. J., Evans, C. A., Gropman, B.Rusch, D. B., Venter, E., Wang, M.Smith, T. J., Houck, J. T., Tompkins, D. E., Haynes, C., Jacob, D.Chin, S. H., Allen, D. R., Dahlke, C. E., Sanders, R., Li, K., Liu, X.Levitsky, A. A., Majoros, W. H., Chen, Q.Xia, A. C., Lopez, J. R., Donnelly, M. T., Newman, M. H., Glodek, A.Kraft, C. L., Nodell, M., Ali, F.An, H. J., Baldwin-Pitts, D.Beeson, K. Y., Cai, S., Carnes, M., Carver, A.Caulk, P. M., Center, A.Chen, Y. H., Cheng, M. L., Coyne, M. D., Crowder, M., Danaher, S.Davenport, L. B., Desilets, R.Dietz, S. M., Doup, L., Dullaghan, P., Ferriera, S.Fosler, C. R., Gire, H. C., Gluecksmann, A.Gocayne, J. D., Gray, J., Hart, B., Haynes, J., Hoover, J., Howland, T., Ibegwam, C., Jalali, M., Johns, D., Kline, L.Ma, D. S., MacCawley, S., Magoon, A., Mann, F., May, D.McIntosh, T. C., Mehta, S., Moy, L.Moy, M. C., Murphy, B. J., Murphy, S. D., Nelson, K. A., Nuri, Z.Parker, K. A., Prudhomme, A. C., Puri, V. N., Qureshi, H.Raley, J. C., Reardon, M. S., Regier, M. A., Rogers, Y. H., Romblad, D. L., Schutz, J.Scott, J. L., Scott, R.Sitter, C. D., Smallwood, M.Sprague, A. C., Stewart, E., Strong, R. V., Suh, E., Sylvester, K., Thomas, R.Tint, N. N., Tsonis, C., Wang, G., Wang, G.Williams, M. S., Williams, S. M., Windsor, S. M., Wolfe, K.Wu, M. M., Zaveri, J., Chaturvedi, K.Gabrielian, A. E., Ke, Z., Sun, J., Subramanian, G.Venter, J. C., Pfannkoch, C. M., Barnstead, M., and Stephenson, L. D. (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296:1661–1671.CrossRefGoogle ScholarPubMed
Murphy P. M. and Aha D. W. (1994) UCI Repository of Machine Learning Databases, Irvine, CA: University of California, Department of Information and Computer Science. Available online at www.ics.uci.edu/~mlearn/MLRepository.html/
Murthy, S. K., Kasif, S., and Salzberg, S. (1994) A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2:1–32.Google Scholar
Needleman, S. and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48:443–453.CrossRefGoogle ScholarPubMed
Ng A. Y. and Jordan M. I. (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Advances in Neural Information Processing Systems (NIPS) 14:841–848.
Normandin, Y. (1996) Maximum mutual information estimation of hidden Markov models. In Lee, C.-H., Soong, F. K., and Paliwal, K. K. (eds.) Automatic Speech and Speaker Recognition, pp. 58–81. New York: Kluwer.Google Scholar
Normark, S., Bergstrom, S., Edlund, T., Grundstrom, T., Jaurin, B., Lindberg, F. P., and Olsson, O. (1983) Overlapping genes. Annual Review of Genetics 17:499–525.CrossRefGoogle ScholarPubMed
Ohler, U., Stemmer, G., Harbeck, S., and Niemann, H. (2000) Stochastic segment models of eukaryotic promoter regions. Proceedings of the Pacific Symposium on Biocomputing 5:377–388.Google Scholar
Ohler, U., Niemann, H., Liao, G., and Rubin, G. M. (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17:S199–S206.CrossRefGoogle ScholarPubMed
Ohler, U., Liao, G., Niemann, H., and Rubin, G. (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biology 3(12):r0087.1–r0087. 12.CrossRefGoogle ScholarPubMed
Ohler, U., Shomron, N., and Burge, C. B. (2005) Recognition of unknown conserved alternatively spliced exons. PLoS Computational Biology 1(2):e15.CrossRefGoogle ScholarPubMed
Oliver, J. L., Carpena, P., Hackenberg, M., and Bernaola-Galván P, (2004) IsoFinder: computational prediction of isochores in genome sequences. Nucleic Acids Research 32:W287–W292.CrossRefGoogle ScholarPubMed
Osuna, E., Freund, R., and Girosi, F. (1997) An improved training algorithm for support vector machines. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285.Google Scholar
Pachter, L., Batzoglou, S., Spitkovsky, V. I., Banks, E., Lander, E. S., Kleitman, D. J., and Berger, B. (1999) A dictionary based approach for gene annotation. Journal of Computational Biology6:419–430.CrossRefGoogle ScholarPubMed
Pachter, L., Alexanderson, M., and Cawley, S. (2002) Applications of generalized pair hidden Markov models to alignment and gene finding problems. Journal of Computational Biology 9:389–399.CrossRefGoogle ScholarPubMed
Parra, G., Agarwal, P., Abril, J. F., Wiehe, T., Fickett, J. W., and Guigó, R. (2003) Comparative gene prediction in human and mouse. Genome Research 13:108–117.CrossRefGoogle Scholar
Patterson, D., Yasuhara, K., and Ruzzo, W. L. (2002) Pre-mRNA secondary structure prediction aids splice site prediction. Pacific Symposium on Bioinformatics 7:223–234.Google Scholar
Pavesi, A., Iaco, B., Granero, M. I., and Porati, A. (1997) On the informational content of overlapping genes in prokaryotic and eukaryotic viruses. Journal of Molecular Evolution 44:625–631.CrossRefGoogle ScholarPubMed
Pearl, J. (1991) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 2nd edn. Los Altos, CA: Morgan Kaufmann.Google Scholar
Pearson, W. R. and Wood, T. C. (2001) Statistical significance in biological sequence comparison. In Balding, D. J., Bishop, M., and Cannings, C. (eds.) Handbook of Statistical Genetics, pp. 39–65. New York: John Wiley.Google Scholar
Pedersen, J. S. and Hein, J. (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–227.CrossRefGoogle ScholarPubMed
Pertea M. (2005) The Glimmer HMM Home Page. Available online at: www.cbcb.umd.edu/software/GlimmerHMM
Pertea, M., Lin, X. and Salzberg, S. L. (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Research 29:1185–1190.CrossRefGoogle ScholarPubMed
Pertea, M. and Salzberg, S. L. (2002) Computational gene finding in plants. Plant Molecular Biology 48:48–49.CrossRefGoogle ScholarPubMed
Platt, J. (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98–14. Redmond, WA: Microsoft Corporation.Google Scholar
Pontius, J. U., Wagner, L., and Schuler, G. D. (2003) UniGene: a unified view of the transcriptome. In McEntyre, J. and Ostell, J. (eds.) The NCBI Handbook, pp. 21–1–21–12. Bethesda, MD: National Center for Biotechnology Information.Google Scholar
Pop, M., Salzberg, S. L., and Shumway, M. (2002) Genome sequence assembly: algorithms and issues. IEEE Computer 35:47–54.CrossRefGoogle Scholar
Potamianos, G. and Jelinek, F. (1998) A study of n-gram and decision tree letter language modeling methods. Speech Communication, 24:171–192.CrossRefGoogle Scholar
Powell, M. J. D. (1981) Nonlinear Optimization. New York: Academic Press.Google Scholar
Pozo, R. (1997) Template numerical toolkit for linear algebra: high performance programming with C++ and the Standard Template Library. International Journal of Supercomputer Applications and High Performance Computing 11:251–263.CrossRefGoogle Scholar
Press, W. H., Flanner, B. P., Teukolsky, S. A., and Vetterling, W. T. (1992) Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge: Cambridge University Press.Google Scholar
Provost F. J. and Hennessy D. N. (1994) Distributed machine learning: scaling up with coarse-grained parallelism. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 340–347.
Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 33:D501–D504.CrossRefGoogle ScholarPubMed
Quinlan, R. (1993) C4.5: Programs for Machine Learning. Los Altos, CA: Morgan Kaufmann.Google Scholar
Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77:257–286.CrossRefGoogle Scholar
Rätsch G, , Sonnenburg, S., and Schölkopf B, (2005) RASE: recognition of alternatively spliced exons in C.elegans. Bioinformatics 21 (Suppl. 1):i369–i377.CrossRefGoogle ScholarPubMed
Reese M. and Eeckman F. (1995) Novel neural network prediction systems for human promoters and splice sites. In Searls GSD., Fickett J., Noordewier M. (eds.) Proceedings of the Workshop on Gene-Finding and Gene Structure Prediction, Philadelphia, PA, pp. 311–324.
Reese, M. G., Eeckman, F. H., Kulp, D., and Haussler, D. (1997) Improved splice site detection in Genie. Journal of Computational Biology 4:311–323.CrossRefGoogle ScholarPubMed
Reese, M. G., Hartzell, G., Harris, N. L., Ohler, U., and Lewis, S. E. (2000) Genome annotation assessment in Drosophila melanogaster. Genome Research 10:483–501.CrossRefGoogle ScholarPubMed
Reichl W. and Ruske G. (1995) Discriminative training for continuous speech recognition. In Proceedings of the 4th European Conference on Speech Communication and Technology, pp. 537–540.
Rissanen, J. (1978) Modeling by shortest data description. Automatica 14:465–471.CrossRefGoogle Scholar
Ristad E. S. and Thomas R. G. (1997) Hierarchical non-emitting Markov models. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics.
Rivas, E. and Eddy, S. R. (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology 285:2053–2068.CrossRefGoogle ScholarPubMed
Rivas, E. and Eddy, S. R. (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16:583–605.CrossRefGoogle Scholar
Rivas, E. and Eddy, S. R. (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2:8.CrossRefGoogle ScholarPubMed
Rombauts, S., Florquin, K., Lescot, M., Marchasl, K., Rouzé, P., and Peer, Y. (2003) Computational approaches to identify promoters and cis-regulatory elements in plant genomes. Plant Physiology 132:1162–1176.CrossRefGoogle ScholarPubMed
Rosenblatt, F. (1958) The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65:386–408.CrossRefGoogle Scholar
Roth V. and Steinhage V. (1999) Nonlinear discriminant analysis using kernel functions. In Proceedings of the 12th International Conference on Advances in Neural Information Processing Systems, pp. 568–574.
Saetrom, P, Sneve, R., Kristiansen, K. I., Sn⊘ve, O. J., Grünfeld, T., Rognes, T., and Seeberg, E. (2005) Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming. Nucleic Acids Research 33:3263–3270.CrossRefGoogle ScholarPubMed
Saeys Y. (2004) Feature selection for classification of nucleic acid sequences. Ph.D. thesis, University of Ghent, Belgium.
Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P., and Peer, Y. (2004) Feature selection for splice site prediction: a new method using EDA-based feature ranking. BMC Bioinformatics 5:64.CrossRefGoogle ScholarPubMed
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406–425.Google ScholarPubMed
Sakai M., Yoneda M., and Hase H. (1998) A new robust quadratic discriminant function. In Proceedings of the 14th International Conference on Pattern Recognition, pp. 99–102.CrossRef
Salzberg, S. L. (1999) On comparing classifiers: a critique of current research and methods. Data Mining and Knowledge Discovery 1:1–12.Google Scholar
Salzberg, S. L., Delcher, A. L., Kasif, S., and White, O. (1998a) Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26:544–548.CrossRefGoogle Scholar
Salzberg, S. L., Pertea, M., Delcher, A. L., Gardner, M. J., and Tettelin, H. (1998b) Interpolated Markov models for eukaryotic gene finding. Genomics 59:24–31.CrossRefGoogle Scholar
Schadt, E. and Lange, K. (2002) Codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution 19:1534–1549.CrossRefGoogle ScholarPubMed
Schlüter, R., Macherey, W., Müller B., , and Ney, H. (2001) Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication 34:287–310.CrossRefGoogle Scholar
Schultz J., Milpetz F., Bork P., and Ponting C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences of the USA 95:5857–5864.CrossRef
Schwartz R. and Chow Y.-L. (1990) The N-best algorithm: an efficient and exact procedure for finding the N most likely hypotheses. In Proceedings of the IEEE Conference on Aconstics, Speech, and Signal Processing, pp. 81–84.CrossRef
Seneff, S., Wang, C., and Burge, C. B. (2004) Gene structure prediction using an orthologous gene of known exon–intron structure. Applied Bioinformatics 3:81–90.CrossRefGoogle ScholarPubMed
Servant, F., Bru, C., Carre, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. (2002) ProDom: automated clustering of homologous domains. Briefings in Bioinformatics 3:246–251.CrossRefGoogle ScholarPubMed
Shannon, C. E. (1948) A mathematical theory of communication. Bell System Technical Journal 27:379–423, 623–656.CrossRefGoogle Scholar
Shmatkov, A. M., Melikyan, A. A., Chernousko, F. L., and Borodovsky, M. (1999) Finding prokyarotic genes by the “frame-by-frame” algorithm: targeting gene starts and overlapping genes. Bioinformatics 15:874–886.CrossRefGoogle ScholarPubMed
Siepel, A. and Haussler, D. (2004a) Combining phylogenetic and hidden Markov models in biosequence analysis. Journal of Computational Biology 11:413–428.CrossRefGoogle Scholar
Siepel A. and Haussler D. (2004b) Computational identification of evolutionarily conserved exons. In Research in Computational Molecular Biology (RECOMB'04), pp. 277–286.
Siepel, A. and Haussler, D. (2004c) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Molecular Biology and Evolution 21:468–488.CrossRefGoogle Scholar
Siepel, A. and Haussler, D. (2005) Phylogenetic hidden Markov models. In Nielsen, R. (ed.) Statistical Methods in Molecular Evolution, pp. 1034–1050. New York: Springer.Google Scholar
Simonoff, J. S. (1996) Smoothing Methods in Statistics. New York: Springer.CrossRefGoogle Scholar
Sinha, S., Nimwegen, E., and Siggia, E. D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19:i292–i301.CrossRefGoogle ScholarPubMed
Smit A. F. A., and Green P. (1996) RepeatMasker. Available online at http://ftp.genome.waschington.edu/RM/ RepeatMasker.html/
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147:195–197.CrossRefGoogle ScholarPubMed
Sneath, P. H. A. and Sokal, R. R. (1973) Numerical Taxonomy. San Francisco, CA: W. H. Freeman.Google Scholar
Snyder E. E. (1994) Identification of protein coding regions in genomic DNA. Ph.D. thesis, University of Colorado, Boulder, CO.
Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Research 21:607–613.CrossRefGoogle ScholarPubMed
Sokal, R. R. and Rohlf, F. J. (1995) Biometry: The Principles and Practice of Statistics in Biological Research. New York: W. H. Freeman.Google Scholar
Solovyev, V. V. and Shahmuradov, I. A. (2003) PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Research 31:3540–3545.CrossRefGoogle ScholarPubMed
Solovyev V. V., Salamov A. A., and Lawrence C. B. (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. In Proceedings of the 3rd International Conference on Intelligent Systems for Molecular Biology, pp. 367–375.
Sonnenburg S. (2002) New methods for splice site recognition. Diploma thesis, Humboldt University, Berlin, Germany.
Sonnenburg S., Zien A., and Rätsch G. (2006) ARTS: accurate recognition of transcription starts in human. In Proceedings of the 14th International Conference on Intelligent Systems for Molecular Biology, pp. 472–480.CrossRef
Sorek, R., Ast, G., and Graur, D. (2002) Alu-containing exons are alternatively spliced. Genome Research 12:1060–1067.CrossRefGoogle ScholarPubMed
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12:505–519.CrossRefGoogle ScholarPubMed
Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The Bioperl toolkit: perl modules for the life sciences. Genome Research 12:1611–1618.CrossRefGoogle ScholarPubMed
Stanke, M. and Waack, S. (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:II215–II225.CrossRefGoogle Scholar
Stein, L. (2001) Genome annotation: from sequence to biology. Nature Reviews: Genetics 2:493–503.CrossRefGoogle ScholarPubMed
Stormo G. D. and Haussler D. (1994) Optimally parsing a sequence into different classes based on multiple types of evidence. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 369–375.
Suzek, B. E., Ermolaeva, M. D., Schreiber, M., and Salzberg, S. L. (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17:1123–1130.CrossRefGoogle ScholarPubMed
Tikhonov, A. N. (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics, Doklady 4:1035–1038.Google Scholar
Tipping, M. E. (2001) Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1:211–244.Google Scholar
Tong S. and Koller D. (2000) Restricted Bayes optimal classifiers. In Proceedings of the 17th National Conference on Artificial Intelligence, pp. 658–664.
Toutanova K., Mitchell M., and Manning C. D. (2003) Optimizing local probability models for statistical parsing. In Proceedings of the 14th European Conference on Machine Learning, pp. 409–420.CrossRef
Tveter, D. (1998) The Pattern Recognition Basis of Artificial Intelligence. Indianapolis, IN: Wiley-IEEE Computer Society Press.Google Scholar
Uberbacher, E. C. and Mural, R. J. (1991) Locating protein coding regions in human DNA sequences using a multiple-sensor neural network approach. Proceedings of the National Academy of Sciences of the USA 88:11261–11265.CrossRefGoogle ScholarPubMed
Usuka, J., Zhu, W., and Brendel, V. (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16:203–224.CrossRefGoogle ScholarPubMed
Vapnik, V. (1998) Statistical Learning Theory. New York: John Wiley.Google Scholar
Venter, J. C., Smith, H. O., and Hood, L. (1996) A new strategy for genome sequencing. Nature 381:364–366.CrossRefGoogle ScholarPubMed
Venter, J. C.Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M.Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P.Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q.Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G.Thomas, P. D., Zhang, J., Gabor Miklos G. L., Nelson, C., Broder, S.Clark, A. G., Nadeau, J.McKusick, V. A., Zinder, N.Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C.Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P.Heiman, T. J., Higgins, M. E., Ji, R. R., Ke, Z.Ketchum, K. A., Lai, Z., Lei, Y., Li, J., Li, Z., Liang, Y., Lin, X., Lu, F.Merkulov, G. V., Milshina, N.Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D.Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A.Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, L., Moy, M., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R.Rogers, Y. H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R.Tint, N. N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, M., Williams, S., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K.Abril, J. F., Guigó R.Campbell, M. J., Sjolander, K. V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P.Chiang, Y. H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D.Majoros, W. H., McDaniel, J., Murphy, S., Newman, M., Nguyen, N., Nguyen, T., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. (2001) The sequence of the human genome. Science 291:1304–1351.CrossRefGoogle ScholarPubMed
Voorhees, E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22:465–476.CrossRefGoogle Scholar
Vinson, J., DeCaprio, D., Luoma, S., and Galagan, J. E. (2006) Gene prediction using conditional random fields. In: The Biology of Genomes, Cold Spring Harbor Laboratory, New York, May 10–14, 2006 (abstract).Google Scholar
Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory 13:260–269.CrossRefGoogle Scholar
Hippel, P. T. (2005) Mean, median, and skew: correcting a textbook rule. Journal of Statistics Education 13.Google Scholar
Wain, H. M., Lovering, R. C., Bruford, E. A., Lush, M. J., Wright, M. W., and Povey, S. (2002) Guidelines for human gene nomenclature. Genomics 79:464–470.CrossRefGoogle ScholarPubMed
Watson, J. D. and Crick, FHC. (1953) Molecular structure of nucleic acids. Nature 4356:737–738.CrossRefGoogle Scholar
Wheelan, S. J., Church, D. M., and Ostell, J. M. (2001) Spidey: a tool for mRNA-to-genomic alignments. Genome Research 11:1952–1957.Google ScholarPubMed
Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D. L., Khovayko, O., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Pontius, J. U., Pruitt, K. D., Schuler, G. D., Schriml, L. M., Sequeira, E., Sherry, S. T., Sirotkin, K., Starchenko, G., Suzek, T. O., Tatusov, R., Tatusova, T. A., Wagner, L., and Yaschenko, E. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 33:D39–D45.CrossRefGoogle ScholarPubMed
Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., and Guigó, R. (2001) SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Research 9:1574–1583.CrossRefGoogle Scholar
Wingender, E., Kel, A. E., Kel, O. V., Karas, H., Heinemeyer, T., Dietze, P., Knuppel, R., Romaschenko, A. G., and Kolchanov, N. A. (1997) TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation. Nucleic Acids Research 25:265–268.CrossRefGoogle ScholarPubMed
Wojtowicz, W. M., Flanagan, J. J., Millard, S. S., Zipursky, S. L., and Clemens, J. C. (2004) Alternative splicing of Drosophila Dscam generates axon guidance receptors that exhibit isoform-specific homophilic binding. Cell 118:619–633.CrossRefGoogle ScholarPubMed
Wortman, J. R., Haas, B. J., Hannick, L. I., Smith, R. K., Maiti, R., Ronning, C. M., Chan, A. P., Yu, C., Ayele, M., Whitelaw, C. A., White, O. R., and Town, C. D. (2003) Annotation of the Arabidopsis genome. Plant Physiology 132:461–468.CrossRefGoogle ScholarPubMed
Wu, C. H., , Yeh L.-S. L., Guang, H., Arminski, L., Castro-Alvear, J, Chen, Y., Hu, Z.-Z., Ledley, R. S., Kourtesis, P., Suzek, B. E., Vinayaka, C. R., Zhang, J., and Barker, W. C. (2003) The Protein Information Resource. Nucleic Acids Research 31:345–347.CrossRefGoogle ScholarPubMed
Xu Y. and Uberbacher E. C. (1996) Gene prediction by pattern recognition and homology search. In Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, pp. 242–256.
Yan, J. and Marr, T. G. (2005) Computational analysis of 3′-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat. Genome Research 15:369–375.CrossRefGoogle ScholarPubMed
Yang, Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39:306–314.CrossRefGoogle ScholarPubMed
Yeh, R.-F., Lim, L. P., and Burge, C. B. (2001) Computational inference of homologous gene structures in the human genome. Genome Research 11:803–809.CrossRefGoogle ScholarPubMed
Yeo, G. and Burge, C. B. (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of Computational Biology 11:377–394.CrossRefGoogle ScholarPubMed
Yeo, G. W., Nostrand, E., Holste, D., Poggio, T., and Burge, C. B. (2005) Identification and analysis of alternative splicing events conserved in human and mouse. Proceedings of the National Academy of Sciences of the USA 102:2850–2855.CrossRefGoogle Scholar
Younger, D. H. (1967) Recognition and parsing of context-free languages in time n3. Information and Control 10:189–208.CrossRefGoogle Scholar
Yu, P., Ma, D., and Xu, M. (2005) Nested genes in the human genome. Genomics 86:414–422.CrossRefGoogle ScholarPubMed
Zar, J. H. (1996) Biostatistical Analysis, 3rd edn. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Zhang M. Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proceedings of the National Academy of Sciences of the USA 94:565–568.CrossRef
Zhang M. Q. (2003) Prediction, annotation, and analysis of human promoters. Cold Spring Harbor Laboratory Symposium in Quantitative Biology68:217–225.CrossRef
Zhang, M. Q. and Marr, T. G. (1993) A weight array method for splicing signal analysis. Computer Applications in the Biosciences 9:499–509.Google ScholarPubMed
Zhang, H., Hu, J., Recce, M., and Tian, B. (2005) PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Research 33:D116–D120.CrossRefGoogle ScholarPubMed
Zhang, L., Pavlovic, V., Cantor, C. R., and Kasif, S. (2003) Human–mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Research 13:1190–1202.CrossRefGoogle ScholarPubMed
Zhao, J., Hyman, L., and Moore, C. (1999) Formation of mRNA 3′ ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiology and Molecular Biology Reviews 63:405–445.Google ScholarPubMed
Zien, A., Rätsch, G., Mika, S., Scholkopf, B., Lengauer, T., and Muller, K.-R. (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16:799–807.CrossRefGoogle ScholarPubMed
Zuker, M.Mathews, D. H., and Turner D. H. (1999) Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In Barciszewski, J. and Clark, B. F. C. (eds.) RNA Biochemistry and Biotechnology, pp. 11–43. New York: Kluwer.Google Scholar
Aha, D. W. and Bankert, R. L. (1996) A comparative evaluation of sequential feature selection algorithms. In Fisher, D. and Lenz, H.-Z. (eds.) Learning from Data, pp. 199–206. New York: Springer.Google Scholar
Aho, A. V., Sethi, R., and Ullman, J. D. (1986) Compilers: Principles, Techniques, and Tools. Reading, MA: Addison-Wesley.Google Scholar
Allen, J. E. and Salzberg, S. L. (2005) JIGSAW: integration of multiple sources of evidence for gene prediction. Bioinformatics 21:3596–3603.CrossRefGoogle ScholarPubMed
Allen, J. E., Pertea, M., and Salzberg, S. L. (2004) Computational gene prediction using mutliple sources of evidence. Genome Research 14:142–148.CrossRefGoogle Scholar
Allen, J. E., Majoros, W. H., Pertea, M., and Salzberg, S. L. (2006) JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the features of human genes in the ENCODE regions. Genome Biology 7(Suppl. 1):S9.CrossRefGoogle ScholarPubMed
Alexandersson, M., Cawley, S., and Pachter, L. (2003) SLAM: cross-species gene finding and alignment with a generalized pair hidden Markov model. Genome Research 13:496–502.CrossRefGoogle ScholarPubMed
Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Anang, Z., Miller, W., and Lipman, D. J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Research 25:3389–3402.CrossRefGoogle ScholarPubMed
Anton, H. (1987) Elementary Linear Algebra, 5th edn. New York: John Wiley.Google Scholar
Apweiler, R., Attwood, T. K., Bairoch, A., Bateman, A., Birney, E., Biswas, M., Bucher, P., Cerutti, L., Corpet, F., Croning, M. D. R., Durbin, R., Falquet, L., Fleischmann, W., Gouzy, J., Hermjakob, H., Hulo, N., Jonassen, I., Kahn, D., Kanapin, A., Karavidopoulou, Y., Lopez, R., Marx, B., Mulder, N. J., Oinn, T. M., Pagni, M., Servant, F., Sigrist, C. J. A., and Zdobnov, E. M. (2001) The InterPro database, an integrated documentation resource for protein families, domains and functional sites. Nucleic Acids Research 29:37–40.CrossRefGoogle ScholarPubMed
Attwood, T. K., Bradley, P., Flower, D. R., Gaulton, A., Maudling, N., Mitchell, A. L., Moulton, G., Nordle, A., Paine, K., Taylor, P., Uddin, A., and Zygouri, C. (2003) PRINTS and its automatic supplement, prePRINTS. Nucleic Acids Research 31:400–402.CrossRefGoogle ScholarPubMed
Azad, R. K. and Borodovsky, M. (2004) Effects of choice of DNA sequence model structure on gene identification accuracy. Bioinformatics 20: 993–1005.CrossRefGoogle ScholarPubMed
Bafna, V. and Huson, D. H. (2001) The conserved exon method for gene finding. ISMB'2000, 8:3–12.Google Scholar
Bahl L. R., Brown P. F., de Souza P. V., and Mercer R. L. (1986) Maximum mutual information estimation of hidden Markov model parameters for speech recognition. In Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 49–52.CrossRef
Bailey, T. L. and Elkan, C. (1995) Unsupervised learning of multiple motifs in biopolymers using expectation maximization. Machine Learning 21:51–83.CrossRefGoogle Scholar
Bairoch, A. and Apweiler, R. (1996) The SWISS-PROT protein sequence data bank and its new supplement TrEMBL. Nucleic Acids Research 24:21–25.CrossRefGoogle ScholarPubMed
Bajic, V. B., Seah, S. H., Chong, A., Zhang, G., Koh, J. L. Y., and Brusic, V. (2002) Dragon promoter finder: recognition of vertebrate RNA polymerase II promoters. Bioinformatics 18:198–199.CrossRefGoogle ScholarPubMed
Bajic, V. B., Tan, S. L., Suzuki, Y., and Sugano, S. (2004) Promoter prediction analysis on the whole human genome. Nature Biotechnology 22:1467–1473.CrossRefGoogle ScholarPubMed
Bajic, V. B., Brent, M. R., Brown, R. H., Frankish, A., Harrow, J., Ohler, U., Solovyev, V. V., and Tan, S. L. (2006) Performance assessment of promoter predictions on ENCODE regions in the EGASP experiment. Genome Biology 7(Suppl. 1):S3.CrossRefGoogle ScholarPubMed
Baker, J. K. (1979) Trainable grammars for speech recognition. In Proceedings of the Spring Conference of the Acoustical Society of America, Boston, MA, pp. 547–550.Google Scholar
Bartel, D. P. (2004) MicroRNAs: genomics, biogenesis, mechanism, and function. Cell 116:281–297.CrossRefGoogle ScholarPubMed
Bateman, A., Coin, L., Durbin, R., Finn, R. D., Hollich, V., Griffiths-Jones, S., Khanna, A., Marshall, M., Moxon, S., Sonnhammer, E. L. L., Studholme, D. J., Yeats, C., and Eddy, S. R. (2004) The Pfam protein families database. Nucleic Acids Research 32:D138–D141.CrossRefGoogle ScholarPubMed
Batzoglou, S., Pachter, L., Mesirov, J. P., Berger, B., and Lander, ES. (2000) Human and mouse gene structure: comparative analysis and application to exon prediction. Genome Research 7:950–958.CrossRefGoogle Scholar
Baum, L. E. (1972) An inequality and associated maximization technique in statistical estimation for probabilistic functions of Markov processes. Inequalities 3:1–8.Google Scholar
Baum, L. E., Petrie, T., Goules, G., and Weiss, N. (1970) A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics 41:164–171.CrossRefGoogle Scholar
Beaudoing, E., Freier, S., Wyatt, J. R., Claverie, J.-M., and Gautheret, D. (2000) Patterns of variant polyadenylation signal usage in human genes. Genome Research 10:1001–1010.CrossRef
Benson, D. A., Karsch-Mizrachi I, , Lipman, D. J., Ostell, J., and Wheeler, D. L. (2005) GenBank. Nucleic Acids Research 33:D34–D38.CrossRefGoogle ScholarPubMed
Bernardi, G., Olofsson, B., Filipski, J., Zerial, M., Salinas, J., Cuny, G., Meunier-Rotival, M., and Rodier, F. (1985) The mosaic genome of warm-blooded vertebrates. Science 228:953–958.CrossRefGoogle ScholarPubMed
Besemer, J., Lomsadze, A., and Borodovsky, M. (2001) GeneMarkS: a self-training method for prediction of gene starts in microbial genomes – implications for finding sequence motifs in regulatory regions. Nucleic Acids Research 29:2607–2618.CrossRefGoogle ScholarPubMed
Birney, E., Andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., Down, T., Durbin, R., Fernandez-Suarez, X. M., Flicek, P., Gräf, S., Hammond, M., Herrero, J., Howe, K., Iyer, V., Jekosch, K., Kähäri, A., Kasprzyk, A., Keefe, D., Kokocinski, F., Kulesha, E., London, D., Longden, I., Melsopp, C., Meidl, P., Overduin, B., Parker, A., Proctor, G., Prlic, A., Rae, M., Rios, D., Redmond, S., Schuster, M., Sealy, I., Searle, S., Severin, J., Slater, G., Smedley, D., Smith, J., Stabenau, A., Stalker, J., Trevanion, S., Ureta-Vidal, A., Vogel, J., White, S., Woodwark, C., and Hubbard, T. J. P. (2006) Ensembl 2006. Nucleic Acids Research 34:D556–D561.CrossRefGoogle ScholarPubMed
Blanco, E., Parra, G., and Guigó, R. (2002). Using GENEID to identify genes. In Baxevanis, A. (ed.), Current Protocols in Bioinformatics, unit 4.3. New York: John Wiley.Google Scholar
Boguski, M. S., Lowe, T. M., and Tolstoshev, C. M. (1993) dbEST: database for expressed sequence tags. Nature Genetics 4:332–333.CrossRefGoogle ScholarPubMed
Borodovsky, M. and McIninch, J. (1993) GENMARK: parallel gene recognition for both DNA strands. Computers and Chemistry 16:37–43.Google Scholar
Borodovsky, M., Rudd, K. E., and Koonin, E. V. (1994) Intrinsic and extrinsic approaches for detecting genes in a bacterial genome. Nucleic Acids Research 22:4756–4767.CrossRefGoogle Scholar
Bouchard G. and Triggs B. (2004) The trade-off between generative and discriminative classifiers. In J. Antoch (ed.), Proceedings of International Symposinm on Computational Statistics (COMPSTAT) 2004, pp. 1–9.
Bray, N., Dubchak, I., and Pachter, L. (2003) AVID: a global alignment program. Genome Research 13:97–102.CrossRefGoogle ScholarPubMed
Breiman, L., Friedman, J. H., Olshen, R. A., and Stone, C. J. (1984) Classification and Regression Trees. Monterey, CA: Wadsworth International.
Brejová, B., Brown, D. G., Li, M., and Vinar, T. (2005) ExonHunter: a comprehensive approach to gene finding. Bioinformatics 21(Suppl. 1):i57–i65.CrossRefGoogle ScholarPubMed
Brendel, V. and Kleffe, J. (1998) Prediction of locally optimal splice sites in plant pre-mRNA with applications to gene identification in Arabidopsis thaliana genomic DNA. Nucleic Acids Research 26:4748–4757.CrossRefGoogle ScholarPubMed
Brown, R. H., Gross, S. S., and Brent, M. R. (2005) Begin at the beginning: predicting genes with 5′ UTRs. Genome Research 15:742–747.CrossRefGoogle ScholarPubMed
Bucher P. and Bairoch A. (1994) A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 53–61.
Buratti, E. and Baralle, F. E. (2004) Influence of RNA secondary structure on the pre-mRNA splicing process. Molecular Cell Biology 24:10505–10514.CrossRefGoogle ScholarPubMed
Burden, S., Lin, Y. X., and Zhang, R. (2005) Improving promoter prediction for the NNPP2.2 algorithm: a case study using Escherichia coli DNA sequences. Bioinformatics 21:601–607.CrossRefGoogle ScholarPubMed
Burge C. (1997) Identification of complete gene structures in human genomic DNA. Ph.D. thesis Stanford University, Stanford, CA.
Burge, C. (1998) Modeling dependencies in pre-mRNA splicing signals. In Salzberg, S., Searls, D., and Kasif, S. (eds.), Computational Methods in Molecular Biology, pp. 127–163. Amsterdam: Elsevier.Google Scholar
Burge, C. and Karlin, S. (1997) Prediction of complete gene structures in human genomic DNA. Journal of Molecular Biology 268:78–94.CrossRefGoogle ScholarPubMed
Burge, C. B., Tuschl, T., and Sharp, P. A. (1999) Splicing of precursors to mRNAs by the spliceosomes. In Gesteland, R. F., Cech, T. R., and Atkins, J. F. (eds.) The RNA World, 2nd edn., pp. 525–560. Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press.Google Scholar
Burges, C. J. C. (1998) A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2:121–167.CrossRefGoogle Scholar
Burks, C., Cassidy, M., Cinkosky, M. J., Cumella, K. E., Gilna, P., Hayden, J. E.-D., Keen, G. M., Kelley, T. A., Kelly, M., Kristofferson, D., and Ryals, J. (1991) GenBank. Nucleic Acids Research 19:S2221–S2225.CrossRefGoogle ScholarPubMed
Burset, M. and Guigó R, (1996) Evaluation of gene structure prediction programs. Genomics 34:357–367.CrossRefGoogle ScholarPubMed
Burset, M., Seledtsov, I. A., and Solovyev, V. V. (2000) Analysis of canonical and non-canonical splice sites in mammalian genomes. Nucleic Acids Research 28:4364–4375.CrossRefGoogle ScholarPubMed
Cai, D., Delcher, A., Kao, B., and Kasif, S. (2000) Modeling splice sites with Bayes networks. Bioinformatics 16:152–158.CrossRefGoogle ScholarPubMed
Carninci, P., Sandelin, A., Lenhard, B., Katayama, S., Shimokawa, K., Ponjavic, J., Semple, C. A., Taylor, M. S., Engstrom, P. G., Frith, M. C., Forrest, A. R., Alkema, W. B., Tan, S. L., Plessy, C., Kodzius, R., Ravasi, T., Kasukawa, T., Fukuda, S., Kanamori-Katayama, M., Kitazume, Y., Kawaji, H., Kai, C., Nakamura, M., Konno, H., Nakano, K., Mottagui-Tabar, S., Arner, P., Chesi, A., Gustincich, S., Persichetti, F., Suzuki, H., Grimmond, S. M., Wells, C. A., Orlando, V., Wahlestedt, C., Liu, E. T., Harbers, M., Kawai, J., Bajic, V. B., Hume, D. A., and Hayashizaki, Y. (2006) Genome-wide analysis of mammalian promoter architecture and evolution. Nature Genetics 38:626–635.CrossRefGoogle ScholarPubMed
Cawley, S. E., Wirth, A. I., and Speed, T. P. (2001) Phat: a gene finding program for Plasmodium falciparum. Molecular and Biochemical Parasitology 118:167–174.CrossRefGoogle ScholarPubMed
Choo, K. H., Tong, J. C., and Zhang, L. (2004) Recent applications of hidden Markov models in computational biology. Genomics, Proteomics, and Bioinformatics 2:84–96.CrossRefGoogle ScholarPubMed
Chou W., Juang B. H., and Lee C. H. (1992) Segmental GPD training of HMM based speech recognizer. In Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 1, pp. 473–476.CrossRef
Chow, C. K. and Liu, C. N. (1968) Approximating discrete probability distributions with dependence trees. IEEE Transactions on Information Theory 14:462–467.CrossRefGoogle Scholar
Chuang, T.-J., Lin, W. C., Lee, H. C., Wang, C. W., Hsiao, K. L., Wang, Z. H., Shieh, D., Lin, S. C., and Chang, L. Y. (2003) A complexity reduction algorithm for analysis and annotation of large genomic sequences. Genome Research 13:313–322.CrossRefGoogle ScholarPubMed
Chuang, T.-J., Chen, F.-C., and Chou, M.-Y. (2004) A comparative method for identification of gene structures and alternatively spliced variants. Bioinformatics 20:3064–3079.CrossRefGoogle ScholarPubMed
Clark, F. and Thanaraj, T. A. (2002) Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Human Molecular Genetics 11:451–464.CrossRefGoogle ScholarPubMed
Clote, P. and Backofen, R. (2000) Computational Molecular Biology. New York: John Wiley.Google Scholar
Cocke, J. and Schwartz, J. T. (1970) Programming Languages and their Compilers: Preliminary Notes, Technical Report. New York: Courant Institute of Mathematical Sciences, New York University.
Cor´, D., Herrmann, C., Dieterich, C., Cunto, Di F., Provero, P., and Caselle, M. (2005) Ab initio identification of putative human transcription factor binding sites by comparative genomics. BMC Bioinformatics 6:110.CrossRefGoogle Scholar
Cormen, T. H., Leiserson, C. E., and Rivest, R. L. (1992) Introduction to Algorithms. Cambridge, MA: MIT Press.Google Scholar
Cover, T. M. and Hart, P. E. (1967) Nearest neighbor pattern classification. IEEE Transactions on Information Theory 13:57–67.CrossRefGoogle Scholar
Culotta, A., Kulp, D., and McCallum, A. (2005) Gene Prediction with Conditional Random Fields, Technical Report UM-CS-2005–028. Amherst, MA: University of Massachusetts.
Darwin, C. (1859) On the Origin of Species by Means of Natural Selection, or the Preservation of Favoured Races in the Struggle for Life. London: John Murray.Google Scholar
Davuluri, V. D., Grosse, I., and Zhang, M. Q. (2001) Computational identification of promoters and first exons in the human genome. Nature Genetics 29:412–417.CrossRefGoogle ScholarPubMed
Dawkins, R. (1982) The Extended Phenotype: The Long Reach of the Gene. Oxford: Oxford University Press.Google Scholar
Dawkins, R. (1997) Human chauvinism. Evolution 51:1015–1020.CrossRefGoogle Scholar
Dayhoff, M., Schwartz, R. M., and Orcutt, B. C. (1978) A model of evolutionary change in proteins. Atlas of Protein Sequence and Structure 5:345–352.Google Scholar
Delcher, A. L., Kasif, S., Fleischmann, R. D., Peterson, J., White, O., and Salzberg, S. L. (1999a) Alignment of whole genomes. Nucleic Acids Research 27:2369–2376.CrossRefGoogle Scholar
Delcher, A. L., Harmon, D., Kasif, S., White, O., and Salzberg, S. L. (1999b) Improved microbial gene identification with GLIMMER. Nucleic Acids Research 27:4636–4641.CrossRefGoogle Scholar
Delcher, A. L., Phillippy, A., Carlton, J., and Salzberg, S. L. (2002) Fast algorithms for large-scale genome alignment and comparision. Nucleic Acids Research 30:2478–2483.CrossRefGoogle Scholar
Delphin, M. E., Stockwell, P. A., Tate, W. P., and Brown, C. M. (1999) Transterm, the translational signal database, extended to include full coding sequence and untranslated regions. Nucleic Acids Research 27:293–294.CrossRefGoogle Scholar
Dieterich, C., Grossmann, S., Tanzer, A., Röpcke, S., Arndt, P. F., Stadler, P. F., and Vingron, M. (2005) Comparative promoter region analysis powered by CORG. BMC Genomics 6:24.CrossRefGoogle ScholarPubMed
Ding, Y. (2006) Statistical and Bayesian approaches to RNA secondary structure prediction. RNA 12:232–331.Google ScholarPubMed
Doudna, J. A. and Cech, T. R. (2002) The chemical repertoire of natural ribozymes. Nature 418:222–228.CrossRefGoogle ScholarPubMed
Down, T. A. and Hubbard, T. J. P. (2002) Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Research 12:458–461.CrossRefGoogle ScholarPubMed
Dror, G., Sorek, R., and Shamir, R. (2004) Accurate identification of alternatively spliced exons using support vector machines. Bioinformatics 21:897–901.CrossRefGoogle Scholar
Duda, R. O., Hart, P. E., and Stork, D. G. (2000) Pattern Classification, 2nd edn. New York: Wiley-Interscience.Google Scholar
Dunteman, G. H. (1989) Principal Components Analysis.London: Sage Publications.CrossRefGoogle Scholar
Durbin, R., Eddy, S., Krogh, A., and Mitchison, G. (1998) Biological Sequence Analysis.Cambridge: Cambridge University Press.CrossRefGoogle Scholar
Eddy, S. R. (2002) Computational genomics of noncoding RNA genes. Cell 109:137–140.CrossRefGoogle ScholarPubMed
Eddy, S. R. (2005) A model of the statistical power of comparative genome sequence analysis. PLoS Biology 3:e10.CrossRefGoogle ScholarPubMed
Eddy, S., Mitchison, G., and Durbin, R. (1995) Maximum discrimination hidden Markov models of sequence consensus. Journal of Computational Biology 2:9–23.CrossRefGoogle ScholarPubMed
Edwards, A. W. F. (1992) Likelihood.Baltimore, MD: Johns Hopkins University Press.Google Scholar
Eisen, J. A., Coyne, R. S., Wu, M., Wu, D., Thiagarajan, M., Wortman, J. R., Badger, J. H., Ren, Q., Amedeo, P., Jones, K. M., Tallon, L. J., Delcher, A. L., Salzberg, S. L., Silva, J. C., Haas, B. J., Majoros, W. H., Farzad, M., Carlton, J. M., Smith, R. K., Garg, J., Pearlman, R. E., Karrer, K. M., Sun, L., Manning, G., Elde, N. C., Turkewitz, A. P., Asai, D. J., Wilkes, D. E., Wang, Y., Cai, H., Collins, K., Stewart, B. A., Lee, S. R., Wilamowska, K., Weinberg, Z., Ruzzo, W. L., Wloga, D., Gaertig, J., Frankel, J., Tsao, C. C., Gorovsky, M. A., Keeling, P. J., Waller, R. F., Patron, N. J., Cherry, J. M., Stover, N. A., Krieger, C. J., Toro, Del C., Ryder, H. F., Williamson, S. C., Barbeau, R. A., Hamilton, E. P., and Orias, E. (2006) Macronuclear genome sequence of the ciliate Tetrahymena thermophila, a model eukaryote. PLoS Biology 29:4(9).Google Scholar
Fairbrother, W. G., Yeh, R. F., Sharp, P. A., and Burge, C. B. (2002) Predictive identification of exonic splicing enhancers in human genes. Science 297:1007–1013.CrossRefGoogle ScholarPubMed
Falconer, D. S. (1996) Introduction to Quantitative Genetics, 4th edn. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Fariselli, P., Martelli, P. L., and Casadio, R. (2005) The posterior-Viterbi: a new decoding algorithm for hidden Markov models. BMC Bioinformatics 6 (Suppl. 4):S12.CrossRefGoogle ScholarPubMed
Fausett, L. V. (1994) Fundamentals of Neural Networks. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Felsenstein, J. (1981) Evolutionary trees from DNA sequences. Journal of Molecular Evolution 17:368–376.CrossRefGoogle ScholarPubMed
Felsenstein, J. (1989) PHYLIP: phylogeny inference package (version 3.2). Cladistics 5:164–166.Google Scholar
Felsenstein, J. (2004) Inferring Phylogenies.Sunderland, MA: Sinauer Associates.Google Scholar
Felsenstein, J. and Churchill, G. A. (1996) A hidden Markov model approach to variation among sites in rate of evolution. Molecular Biology and Evolution 13:93–104.CrossRefGoogle ScholarPubMed
Fischer, C. N. and LeBlanc, R. J. (1991) Crafting a Compiler with C. Menlo Park, CA: Benjamin/Cummings.Google Scholar
Fisher, R. A. (1936) The use of multiple measurements in taxonomic problems. Annals of Eugenics 7:179–188.CrossRefGoogle Scholar
Fletcher, R. (1980) Practical Methods of Optimization, vol. 1, Unconstrained Optimization.New York: John Wiley.Google Scholar
Florea, L., Hartzell, G., Zhang, Z., Rubin, G. M., and Miller, W. (1998) A computer program for aligning a cDNA sequence with a genomic DNA sequence. Genome Research 8:967–974.CrossRefGoogle ScholarPubMed
Florea, L., Di Francesco, V., Miller, J., Turner, R., Yao, A., Harris, M., Walenz, B., Mobarry, C., Merkulov, G. V., Charlab, R., Dew, I., Deng, Z., Istrail, S., Li, P., and Sutton, G. (2005) Gene and alternative splicing annotation with AIR. Genome Research 15:54–66.CrossRefGoogle ScholarPubMed
Fogel, D. B. (2005) Evolutionary Computation: Toward a New Philosophy of Machine Intelligence, 3rd edn. New York: Wiley-IEEE Press.CrossRefGoogle Scholar
Foissac, S. and Schiex, T. (2005) Integrating alternative splicing detection into gene prediction. BMC Bioinformatics 6:25.CrossRefGoogle ScholarPubMed
Freund Y. and Schapire R. E. (1995) A decision-theoretic generalization of on-line learning and an application to boosting. In Proceedings of the European Conference on Computational Learning Theory, pp. 23–37.CrossRef
Friedman, J. H. (1989) Regularized discriminant analysis. Journal of the American Statistical Association 84:165–175.CrossRefGoogle Scholar
Galassi, M., Davies, J., Theiler, J., Gough, B., Jungman, G., Booth, M., and Rossia, F. (2005) GNU Scientific Library Reference Manual, 2nd edn. Bristol, UK: Network Theory Ltd.Google Scholar
Gangal, R. and Sharma, P. (2005) Human pol II promoter prediction: time series descriptors and machine learning. Nucleic Acids Research 33:1332–1336.CrossRefGoogle ScholarPubMed
Gene Ontology Consortium (2000) Gene ontology: tool for the unification of biology. Nature Genetics 25:25–29.CrossRef
Gierasch, L. M. (1989) Signal sequences. Biochemistry 28:923–930.CrossRefGoogle ScholarPubMed
Goldberg, D. E. (1989) Genetic Algorithms in Search, Optimization and Machine Learning. Reading, MA: Addison-Wesley.Google Scholar
Gould, S. J. (1994) The evolution of life on earth. Scientific American 271:62–69.CrossRefGoogle Scholar
Goutte C., Gaussier E., Cancedda N., and Dejean H. (2004) Generative vs. discriminative approaches to entity recognition from label-deficient data. In Féme Journées Internationales Analyse Statistique des Données Textuelles (JADT 2004), pp. 1–10.
Gropp, W., Lusk, E., and Skjellum, A. (1994) Using MPI: Portable Parallel Programming with the Message-Passing Interface. Cambridge, MA: MIT Press.Google Scholar
Gross, S. S. and Brent, M. R. (2005) Using multiple alignments to improve gene prediction. In Research in Computational Molecular Biology (RECOMB'05), pp. 374–388.Google Scholar
Guigó, R. (1998) Assembling genes from predicted exons in linear time with dynamic programming. Journal of Computational Biology 5:681–702.CrossRefGoogle ScholarPubMed
Guigó, R., Flicek, P., Abril, J. F., Reymond, A., Lagarde, J., Denoeud, F., Antonarakis, S., Ashburner, M., Bajic, V. B., Birney, E., Castelo, R., Eyras, E., Gingeras, T. R., Harrow, J., Hubbard, T., Lewis, S., Ucla, C., and Reese, M. G. (2006) EGASP: the human ENCODE genome annotation assessment project. Genome Biology 7(Suppl. 1):S2.CrossRefGoogle ScholarPubMed
Haas, B. J., Delcher, A. L., Mount, S. M., Wortman, J. R., Smith, R. K., Hannick, L. I., Maiti, R., Ronning, C. M., Rusch, D. B., Town, C. D., Salzberg, S. L., and White, O. (2003) Improving the Arabidopsis genome annotation using maximal transcript alignment assemblies. Nucleic Acids Research 31:5654–5666.CrossRefGoogle ScholarPubMed
Hall, K. B., Green, M. R., and Redfield, A. G. (1988) Structure of a pre-mRNA branch point / 3′ splice site region. Proceedings of the National Academy of Sciences of the USA 85:704–708.Google ScholarPubMed
Hannenhalli, S. and Levy, S. (2001) Promoter prediction in the human genome. Bioinformatics 17:S90–S96.CrossRefGoogle ScholarPubMed
Hasegawa, M., Kishino, H., and Yano, T. (1985) Dating of the human–ape splitting by a molecular clock of mitochondrial DNA. Journal of Molecular Evolution 22:160–174.CrossRefGoogle ScholarPubMed
Hastie, T., Tibshirani, R., and Friedman, J. H. (2003) The Elements of Statistical Learning. New York: Springer.Google Scholar
Heber, S., Alekseyev, M., Sze, S.-H., Tang, H., and Pevzner, P. A. (2002) Splicing graphs and EST assembly problem. Bioinformatics 18:S181–S188.CrossRefGoogle ScholarPubMed
Heckerman D., Geiger D., and Chickering D. (1994) Learning Bayesian networks: the combination of knowledge and statistical data. In Knowledge Discovery and Data Mining Workshop (KDD '94), pp. 85–96.CrossRef
Henderson, J., Salzberg, S., and Fasman, K. (1997) Finding genes in human DNA with a hidden Markov model. Journal of Computational Biology 4:127–141.CrossRefGoogle Scholar
Henikoff, J. G., Greene, E. A., Pietrokovski, S., and Henikoff, S. (2000) Increased coverage of protein families with the BLOCKS database servers. Nucleic Acids Research 28:228–230.CrossRefGoogle ScholarPubMed
Henikoff, S. and Henikoff, J. G. (1992) Amino acid substitution matrices from protein blocks. Proceedings of the National Academy of Sciences of the USA 89:10915–10919.Google ScholarPubMed
Heylighen, F. (1999) The growth of structural and functional complexity during evolution. In Heylighen, F., Bollen, J., and Riegler, A. (eds.) The Evolution of Complexity, pp. 17–44. New York: Kluwer.Google Scholar
Hirschberg, D. (1975) A linear space algorithm for computing maximal common subexpressions. Communications of the Association of Computing Machinery 18:341–343.CrossRefGoogle Scholar
Hofacker, I. L., Fontana, W., Stadler, P. F., Bonhoeffer, L. S., Tacker, M., and Schuster, P. (1994) Fast folding and comparison of RNA secondary structures. Chemical Monthly 125:167–188.CrossRefGoogle Scholar
Hopcroft, J. E. and Ullman, J. D. (1979) Introduction to Automata Theory, Languages, and Computation. Reading, MA: Addison-Wesley.Google Scholar
Hoskins, R. A., Smith, C. D., Carlson, J. W., Carvalho, A. B., Halpern, A., Kaminker, J. S., Kennedy, C., Mungall, C. J., Sullivan, B. A., Sutton, G. G., Yasuhara, J. C., Wakimoto, B. T., Myers, E. W., Celniker, S. E., Rubin, G. M., and Karpen, G. H. (2002) Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biology 3:0085.1–0085.16.CrossRefGoogle Scholar
Howe, K. L., Chothia, T., and Durbin, R. (2002) GAZE: a generic framework for the integration of gene-prediction data by dynamic programming. Genome Research 12:1418–1427.CrossRefGoogle ScholarPubMed
Huang, X., Adams, M. D., Zhou, H., and Kerlavage, A. R. (1997) A tool for analyzing and annotating genomic sequences. Genomics 46:35–45.CrossRefGoogle ScholarPubMed
International Human Genome Sequencing Consortium (2001) Initial sequencing and analysis of the human genome. Nature 409:860–921.CrossRef
International Human Genome Sequencing Consortium (2004) Finishing the euchromatic sequence of the human genome. Nature 431:931–945.CrossRef
Jaakkola, T. S. and Haussler, D. (1999) Exploiting generative models in discriminative classifiers. Advances in Neural Information Processing Systems 11:487–493.Google Scholar
Jeffreys, A. J., Wilson, V., and Thein, S. L. (1985) Individual-specific “fingerprints” of human DNA. Nature 316:76–79.CrossRefGoogle ScholarPubMed
Jelinek, F. (1997) Statistical Methods for Speech Recognition. Bradford, MA: Bradford Books.Google Scholar
Jelinek F. and Mercer R. L. (1980) Interpolated estimation of Markov source parameters. In Proceedings of the Workshop on Pattern Recognition in Practice, May 1980,
Jensen, F. V. (2001) Bayesian Networks and Decision Graphs. New York: Springer.CrossRefGoogle Scholar
Jenuwein, T. and Allis, C. D. (2001) Translating the histone code. Science 293:1074–1080.CrossRefGoogle ScholarPubMed
Johansen F. T. (1996) A comparison of hybrid HMM architectures using global discriminative training. In Proceedings of the 4th International Conference on Spoken Language Processing, pp. 498–501.CrossRef
Jordan, M. I., Ghahramani, Z., T. S., Jaakkola, and Saul, L. K. (1999) An introduction to variational methods for graphical methods. In Jordan, M. I. (ed.) Learning in Graphical Models, pp. 105–162. Cambridge, MA: MIT Press.Google Scholar
Jukes, T. H. and Cantor, C. R. (1969) Evolution of protein molecules. In Munro, H. N. (ed.) Mammalian Protein Metabolism, pp. 21–132. New York: Academic Press.Google Scholar
Käll, L., Krogh, A., and Sonnhammer, E. L. (2005) An HMM posterior decoden for sequence feature prediction that includes homology information. Bioinformatics 21 (Suppl. 1): i251–i257.CrossRefGoogle ScholarPubMed
Kamal M., Xie X., and Lander E. S. (2006) A large family of ancient repeat elements in the human genome is under strong selection. Proceedings of the National Academy of Sciences of the USA 103:2740–2745.CrossRef
Karlin S. and Altschul S. F. (1990) Methods for assessing the statistical significance of molecular sequence features by using general scoring schemes. Proceedings of the National Academy of Sciences of the USA 87:2264–8.CrossRef
Kasami, T. (1965). An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages, Scientific Report AFCRL–65–758. Bedford, MA: Air Force Cambridge Research Laboratory.Google Scholar
Katz, S. M. (1987) Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech, and Signal Processing 35: 400–401.CrossRefGoogle Scholar
Kaufman, L. (1998) Solving the quadratic programming problem arising in support vector classification. In Scholkopf, B., Burges, C. J. C., and Smola, A. J. (eds.) Advances in Kernel Methods: Support Vector Learning, pp. 147–167. Cambridge, MA: MIT Press.Google Scholar
Keilson, J. (1979) Markov Chain Models: Rarity and Exponentiality.New York: Springer.CrossRefGoogle Scholar
Kent, W. J. (2002) BLAT: the BLAST-like alignment tool. Genome Research 12:656–664.CrossRefGoogle ScholarPubMed
Kent, W. J., Sugnet, C. W., Furey, T. S., Roskin, K. M., Pringle, T. H., Zahler, A. M., and Haussler, D. (2002) The human genome browser at UCSC. Genome Research 12:996–1006.CrossRefGoogle ScholarPubMed
Kimura, M. (1980) A simple method for estimating evolutionary rate of base substitutions through comparative studies of nucleotide sequences. Journal of Molecular Evolution 16:111–120.CrossRefGoogle ScholarPubMed
Kingsbury, N. G. and Rayner, P. J. W. (1971) Digital filtering using logarithmic arithmetic. Electronic Letters 7:56–58.CrossRefGoogle Scholar
Kohavi R. and Sahami M. (1996) Error-based and entropy-based discretization of continuous features. In Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 114–119.
Korf, I. (2004) Gene finding in novel genomes. BMC Bioinformatics 5:59.CrossRefGoogle ScholarPubMed
Korf, I., Flicek, P., Duan, D., and Brent, M. R. (2001) Integrating genomic homology into gene structure prediction. Bioinformatics 17:S140–S148.CrossRefGoogle ScholarPubMed
Korf, I., Yandell, M., and Bedell, J. (2003) BLAST. Sebastopol, CA: O'Reilly.Google ScholarPubMed
Krogh A. (1994) Hidden Markov models for labeled sequences. In Proceedings of the 12th IAPR International Conference on Pattern Recognition, pp. 140–144.CrossRef
Krogh A. (1997) Two methods for improving performance of an HMM and their application for gene finding. In Proceedings of the 5th International Conference on Intelligent Systems for Molecular Biology, pp. 179–186.
Krogh, A. (1998) An introduction to hidden Markov models for biological sequences. In Salzberg, S. L., Searls, D. B., and Kasif, S. (eds.) Computational Methods in Molecular Biology, pp. 45–62. Amsterdam: Elsevier.Google Scholar
Krogh A. (2000) Using database matches with HMMGene for automated gene detection in Drosophila. Genome Research10:523–528.CrossRef
Krogh, A., Mian, I. S., and Haussler, D. (1994) A hidden Markov model that finds genes in E. coli DNA. Nucleic Acids Research 22:4768–4778.CrossRefGoogle ScholarPubMed
Koza, J. (1992) Genetic Programming: On the Programming of Computers by Means of Natural Selection.Cambridge, MA: MIT Press.Google Scholar
Kullback, S. (1997) Information Theory and Statistics. New York: Dover.Google Scholar
Kulkarni, O. C., Vigneshwar, R., Jayaraman, V. K., and Kulkarni, B. D. (2005) Identification of coding and non-coding sequences using local Hölder exponent formalism. Bioinformatics 21:3818–3823.CrossRefGoogle ScholarPubMed
Kulp, D. and Haussler, D. (1997) Integrating database homology in a probabilistic gene structure model. Pacific Symposium on Bioinformatics 2:232–244.Google Scholar
Kulp D., Haussler D., Reese M., and Eeckman F. (1996) A generalized hidden Markov model for the recognition of human genes in DNA. In Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, pp. 134–142.
Kurtz, S., Phillippy, A., Delcher, A. L., Smoot, M., Shumway, M., Antonescu, C., and Salzberg, S. L. (2004) Versatile and open software for comparing large genomes. Genome Biology 5:R12.1–R12.9.CrossRefGoogle ScholarPubMed
Lam, F., Alexandersson, M., and Pachter, L. (2003) Picking alignments from (Steiner) trees. Journal of Computational Biology 10:509–520.CrossRefGoogle ScholarPubMed
Lander, E. S. and Waterman, M. S. (1988) Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231–239.CrossRefGoogle ScholarPubMed
Landry, J. R., Mager, D. L., and Wilhelm, B. T. (2003) Complex controls: the role of alternative promoters in mammalian genomes. Trends in Genetics 19:640–648.CrossRefGoogle ScholarPubMed
Lapedes, A., Barnes, C., Burks, C., Farber, R., and Sirotkin, K. (1989) Application of neural networks and other machine learning algorithms to DNA sequence analysis. In Bell, G. and Marr, T. (eds.) Computers and DNA: SFI Studies in the Sciences of Complexity, pp. 157–182. Reading, MA: Addison-Wesley.Google Scholar
Larsen, F., Gundersen, G., Lopez, R., and Prydz, H. (1992) CpG islands as gene markers in the human genome. Genomics 13:1095–1107.CrossRefGoogle ScholarPubMed
Lawrence, C. E., Altschul, S. F., Boguski, M. S., Liu, J. S., Neuwald, A. F., and Wootton, J. C. (1993) Detecting subtle sequence signals: a Gibbs sampling strategy for multiple alignment. Science 262:208–214.CrossRefGoogle ScholarPubMed
Lee, C., Grasso, C., and Sharlow, M. F. (2002) Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–464.CrossRefGoogle ScholarPubMed
Lewin, B. (2003) Genes VIII. New York: Prentice-Hall.Google Scholar
Lian, Y. and Garner, H. R. (2005) Evidence for the regulation of alternative splicing via complementary DNA sequence repeats. Bioinformatics 8:1358–1364.CrossRefGoogle Scholar
Lim, L. P., and Burge, C. B. (2001) A computational analysis of sequence features involved in recognition of short introns. Proceedings of the National Academy of Sciences of the USA 98:11193–11198.CrossRefGoogle ScholarPubMed
Liò, P. and Goldman, N. (1998) Models of molecular evolution and phylogeny. Genome Research 8:1233–1244.CrossRefGoogle ScholarPubMed
Loots, G. G., Ovcharenko, I., Pachter, L., Dubchak, I., and Rubin, E. M. (2002) rVista for comparative sequence-based discovery of functional transcriptional factor binding sites. Genome Research 12:832–839.CrossRefGoogle Scholar
Lowe, T. M. and Eddy, S. R. (1999) A computational screen for methylation guide snoRNAs in yeast. Science 283:1168–1171.CrossRefGoogle Scholar
Lukashin, A. V. and Borodovsky, M. (1998) GeneMark.hmm: new solutions for gene finding. Nucleic Acids Research 26:1107–1115.CrossRefGoogle ScholarPubMed
Mackey A. (2005) GLEAN: improved eukaryotic gene prediction by statistical consensus of gene evidence. Poster presented at Genome Informatics Conference, October 28, 2005.
Maglott, D. R., Katz, K. S., Sicotte, H., and Pruitt, K. D. (2000) NCBI's LocusLink and RefSeq. Nucleic Acids Research 28:126–128.CrossRefGoogle ScholarPubMed
Majoros, W. H. and Salzberg, S. L. (2004) An empirical analysis of training protocols for probabilistic gene finders. BMC Bioinformatics 5:206.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., Antonescu, C., and Salzberg, S. L. (2003) GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders. Nucleic Acids Research 31:3601–3604.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., and Salzberg, S. (2004) TIGRscan and GlimmerHMM: two open source ab initio eukaryotic gene finders. Bioinformatics 20:2878–2879.CrossRefGoogle ScholarPubMed
Majoros, W. H., Pertea, M., Delcher, A. L., and Salzberg, S. L. (2005a) Efficient decoding algorithms for generalized hidden Markov model gene finders. BMC Bioinformatics 6:16.CrossRefGoogle Scholar
Majoros, W. H., Pertea, M., and Salzberg, S. L. (2005b) Efficient implementation of a generalized pair hidden Markov model for comparative gene finding. Bioinformatics 21:1782–1788.CrossRefGoogle Scholar
Manly, B. F. J. (1994) Multivariate Statistical Methods: A Primer, 2nd edn. New York: Chapman and Hall.Google Scholar
Manning, C. and Schütze H, (1999) Foundations of Statistical Natural Language Processing. Cambridge, MA: MIT Press.Google Scholar
Marashi, S. A., Goodarzi, H., Sadeghi, M., Eslahchi, C., and Pezeshk, H. (2006) Importance of RNA secondary structure information for yeast donor and acceptor splice site predictions by neural networks. Computational Biology and Chemistry 30:50–57.CrossRefGoogle ScholarPubMed
Markov K., Nakagawa S., and Nakamura S. (2001) Discriminative training of HMM using maximum normalized likelihood algorithm. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing, pp. 497–500.CrossRef
Matlin, A. J., Clark, F., and Smith, C. W. (2005) Understanding alternative splicing: towards a cellular code. Nature Reviews: Molecular Cell Biology 6:386–398.CrossRefGoogle ScholarPubMed
McAuliffe, J. D., Pachter, L., and Jordan, M. I. (2004) Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics 20:1850–1860.CrossRefGoogle ScholarPubMed
McCaskill, J. S. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers 29:1105–1119.CrossRefGoogle ScholarPubMed
Mealy, G. H. (1955) A method for synthesizing sequential circuits. Bell System Technical Journal 34:1045–1079.CrossRefGoogle Scholar
Meyer, I. M. and Durbin, R. (2002) Comparative ab initio prediction of gene structures using pair HMMs. Bioinformatics 18:1309–1318.CrossRefGoogle ScholarPubMed
Meyer, I. M. and Durbin, R. (2004) Gene structure conservation aids similarity based gene prediction. Nucleic Acids Research 32:776–783.CrossRefGoogle ScholarPubMed
Mitchell, T. (1997) Machine Learning. New York: McGraw-Hill.Google Scholar
Mitrophanov, A. Y. and Borodovsky, M. (2006) Statistical significance in biological sequence analysis. Briefings in Bioinformatics 7:2–24.CrossRefGoogle ScholarPubMed
Mizrahi, A. and Sullivan, M. (1986) Calculus and Analytic Geometry, 2nd edn. Belmont, CA: Wadsworth.Google Scholar
Moore, E. F. (1956) Gedanken experiments on sequential machines. In Shannon, C. E. and McCarthy, J. (eds.) Automata Studies, pp. 129–153. Princeton, NJ: Princeton University Press.Google Scholar
Mouse Genome Sequencing Consortium (2002) Initial sequencing and comparative analysis of the mouse genome. Nature 420:520–562.CrossRef
Mural, R. J., Adams, M. D., Myers, E. W., Smith, H. O., Miklos, G. L., Wides, R., Halpern, A., Li, P. W., Sutton, G. G., Nadeau, J., Salzberg, S. L., Holt, R. A., Kodira, C. D., Lu, F., Chen, L., Deng, Z., Evangelista, C. C., Gan, W., Heiman, T. J., Li, J., Li, Z., Merkulov, G. V., Milshina, N. V., Naik, A. K., Qi, R., Shue, B. C., Wang, A., Wang, J., Wang, X., Yan, X., Ye, J., Yooseph, S., Zhao, Q., Zheng, L., Zhu S. C., Biddick, K., Bolanos, R., Delcher, A. L.Dew, I. M., Fasulo, D.Flanigan, M. J., Huson, D. H., Kravitz, S. A., Miller, J. R., Mobarry, C. M., Reinert, K.Remington, K. A., Zhang, Q.Zheng, X. H., Nusskern, D. R., Lai, Z., Lei, Y., Zhong, W., Yao, A., Guan, P.Ji, R. R., Gu, Z.Wang, Z. Y., Zhong, F., Xiao, C.Chiang, C. C., Yandell, M.Wortman, J. R., Amanatides, P. G., Hladun, S. L., Pratts, E. C., Johnson, J. E., Dodson, K. L., Woodford, K. J., Evans, C. A., Gropman, B.Rusch, D. B., Venter, E., Wang, M.Smith, T. J., Houck, J. T., Tompkins, D. E., Haynes, C., Jacob, D.Chin, S. H., Allen, D. R., Dahlke, C. E., Sanders, R., Li, K., Liu, X.Levitsky, A. A., Majoros, W. H., Chen, Q.Xia, A. C., Lopez, J. R., Donnelly, M. T., Newman, M. H., Glodek, A.Kraft, C. L., Nodell, M., Ali, F.An, H. J., Baldwin-Pitts, D.Beeson, K. Y., Cai, S., Carnes, M., Carver, A.Caulk, P. M., Center, A.Chen, Y. H., Cheng, M. L., Coyne, M. D., Crowder, M., Danaher, S.Davenport, L. B., Desilets, R.Dietz, S. M., Doup, L., Dullaghan, P., Ferriera, S.Fosler, C. R., Gire, H. C., Gluecksmann, A.Gocayne, J. D., Gray, J., Hart, B., Haynes, J., Hoover, J., Howland, T., Ibegwam, C., Jalali, M., Johns, D., Kline, L.Ma, D. S., MacCawley, S., Magoon, A., Mann, F., May, D.McIntosh, T. C., Mehta, S., Moy, L.Moy, M. C., Murphy, B. J., Murphy, S. D., Nelson, K. A., Nuri, Z.Parker, K. A., Prudhomme, A. C., Puri, V. N., Qureshi, H.Raley, J. C., Reardon, M. S., Regier, M. A., Rogers, Y. H., Romblad, D. L., Schutz, J.Scott, J. L., Scott, R.Sitter, C. D., Smallwood, M.Sprague, A. C., Stewart, E., Strong, R. V., Suh, E., Sylvester, K., Thomas, R.Tint, N. N., Tsonis, C., Wang, G., Wang, G.Williams, M. S., Williams, S. M., Windsor, S. M., Wolfe, K.Wu, M. M., Zaveri, J., Chaturvedi, K.Gabrielian, A. E., Ke, Z., Sun, J., Subramanian, G.Venter, J. C., Pfannkoch, C. M., Barnstead, M., and Stephenson, L. D. (2002) A comparison of whole-genome shotgun-derived mouse chromosome 16 and the human genome. Science 296:1661–1671.CrossRefGoogle ScholarPubMed
Murphy P. M. and Aha D. W. (1994) UCI Repository of Machine Learning Databases, Irvine, CA: University of California, Department of Information and Computer Science. Available online at www.ics.uci.edu/~mlearn/MLRepository.html/
Murthy, S. K., Kasif, S., and Salzberg, S. (1994) A system for induction of oblique decision trees. Journal of Artificial Intelligence Research 2:1–32.Google Scholar
Needleman, S. and Wunsch, C. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. Journal of Molecular Biology 48:443–453.CrossRefGoogle ScholarPubMed
Ng A. Y. and Jordan M. I. (2002) On discriminative vs. generative classifiers: a comparison of logistic regression and naive Bayes. Advances in Neural Information Processing Systems (NIPS) 14:841–848.
Normandin, Y. (1996) Maximum mutual information estimation of hidden Markov models. In Lee, C.-H., Soong, F. K., and Paliwal, K. K. (eds.) Automatic Speech and Speaker Recognition, pp. 58–81. New York: Kluwer.Google Scholar
Normark, S., Bergstrom, S., Edlund, T., Grundstrom, T., Jaurin, B., Lindberg, F. P., and Olsson, O. (1983) Overlapping genes. Annual Review of Genetics 17:499–525.CrossRefGoogle ScholarPubMed
Ohler, U., Stemmer, G., Harbeck, S., and Niemann, H. (2000) Stochastic segment models of eukaryotic promoter regions. Proceedings of the Pacific Symposium on Biocomputing 5:377–388.Google Scholar
Ohler, U., Niemann, H., Liao, G., and Rubin, G. M. (2001) Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics 17:S199–S206.CrossRefGoogle ScholarPubMed
Ohler, U., Liao, G., Niemann, H., and Rubin, G. (2002) Computational analysis of core promoters in the Drosophila genome. Genome Biology 3(12):r0087.1–r0087. 12.CrossRefGoogle ScholarPubMed
Ohler, U., Shomron, N., and Burge, C. B. (2005) Recognition of unknown conserved alternatively spliced exons. PLoS Computational Biology 1(2):e15.CrossRefGoogle ScholarPubMed
Oliver, J. L., Carpena, P., Hackenberg, M., and Bernaola-Galván P, (2004) IsoFinder: computational prediction of isochores in genome sequences. Nucleic Acids Research 32:W287–W292.CrossRefGoogle ScholarPubMed
Osuna, E., Freund, R., and Girosi, F. (1997) An improved training algorithm for support vector machines. In Proceedings of the IEEE Workshop on Neural Networks for Signal Processing, pp. 276–285.Google Scholar
Pachter, L., Batzoglou, S., Spitkovsky, V. I., Banks, E., Lander, E. S., Kleitman, D. J., and Berger, B. (1999) A dictionary based approach for gene annotation. Journal of Computational Biology6:419–430.CrossRefGoogle ScholarPubMed
Pachter, L., Alexanderson, M., and Cawley, S. (2002) Applications of generalized pair hidden Markov models to alignment and gene finding problems. Journal of Computational Biology 9:389–399.CrossRefGoogle ScholarPubMed
Parra, G., Agarwal, P., Abril, J. F., Wiehe, T., Fickett, J. W., and Guigó, R. (2003) Comparative gene prediction in human and mouse. Genome Research 13:108–117.CrossRefGoogle Scholar
Patterson, D., Yasuhara, K., and Ruzzo, W. L. (2002) Pre-mRNA secondary structure prediction aids splice site prediction. Pacific Symposium on Bioinformatics 7:223–234.Google Scholar
Pavesi, A., Iaco, B., Granero, M. I., and Porati, A. (1997) On the informational content of overlapping genes in prokaryotic and eukaryotic viruses. Journal of Molecular Evolution 44:625–631.CrossRefGoogle ScholarPubMed
Pearl, J. (1991) Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, 2nd edn. Los Altos, CA: Morgan Kaufmann.Google Scholar
Pearson, W. R. and Wood, T. C. (2001) Statistical significance in biological sequence comparison. In Balding, D. J., Bishop, M., and Cannings, C. (eds.) Handbook of Statistical Genetics, pp. 39–65. New York: John Wiley.Google Scholar
Pedersen, J. S. and Hein, J. (2003) Gene finding with a hidden Markov model of genome structure and evolution. Bioinformatics 19:219–227.CrossRefGoogle ScholarPubMed
Pertea M. (2005) The Glimmer HMM Home Page. Available online at: www.cbcb.umd.edu/software/GlimmerHMM
Pertea, M., Lin, X. and Salzberg, S. L. (2001) GeneSplicer: a new computational method for splice site prediction. Nucleic Acids Research 29:1185–1190.CrossRefGoogle ScholarPubMed
Pertea, M. and Salzberg, S. L. (2002) Computational gene finding in plants. Plant Molecular Biology 48:48–49.CrossRefGoogle ScholarPubMed
Platt, J. (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector Machines, Microsoft Research Technical Report MSR-TR-98–14. Redmond, WA: Microsoft Corporation.Google Scholar
Pontius, J. U., Wagner, L., and Schuler, G. D. (2003) UniGene: a unified view of the transcriptome. In McEntyre, J. and Ostell, J. (eds.) The NCBI Handbook, pp. 21–1–21–12. Bethesda, MD: National Center for Biotechnology Information.Google Scholar
Pop, M., Salzberg, S. L., and Shumway, M. (2002) Genome sequence assembly: algorithms and issues. IEEE Computer 35:47–54.CrossRefGoogle Scholar
Potamianos, G. and Jelinek, F. (1998) A study of n-gram and decision tree letter language modeling methods. Speech Communication, 24:171–192.CrossRefGoogle Scholar
Powell, M. J. D. (1981) Nonlinear Optimization. New York: Academic Press.Google Scholar
Pozo, R. (1997) Template numerical toolkit for linear algebra: high performance programming with C++ and the Standard Template Library. International Journal of Supercomputer Applications and High Performance Computing 11:251–263.CrossRefGoogle Scholar
Press, W. H., Flanner, B. P., Teukolsky, S. A., and Vetterling, W. T. (1992) Numerical Recipes in C: The Art of Scientific Computing, 2nd edn. Cambridge: Cambridge University Press.Google Scholar
Provost F. J. and Hennessy D. N. (1994) Distributed machine learning: scaling up with coarse-grained parallelism. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 340–347.
Pruitt, K. D., Tatusova, T., and Maglott, D. R. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Research 33:D501–D504.CrossRefGoogle ScholarPubMed
Quinlan, R. (1993) C4.5: Programs for Machine Learning. Los Altos, CA: Morgan Kaufmann.Google Scholar
Rabiner, L. R. (1989) A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE 77:257–286.CrossRefGoogle Scholar
Rätsch G, , Sonnenburg, S., and Schölkopf B, (2005) RASE: recognition of alternatively spliced exons in C.elegans. Bioinformatics 21 (Suppl. 1):i369–i377.CrossRefGoogle ScholarPubMed
Reese M. and Eeckman F. (1995) Novel neural network prediction systems for human promoters and splice sites. In Searls GSD., Fickett J., Noordewier M. (eds.) Proceedings of the Workshop on Gene-Finding and Gene Structure Prediction, Philadelphia, PA, pp. 311–324.
Reese, M. G., Eeckman, F. H., Kulp, D., and Haussler, D. (1997) Improved splice site detection in Genie. Journal of Computational Biology 4:311–323.CrossRefGoogle ScholarPubMed
Reese, M. G., Hartzell, G., Harris, N. L., Ohler, U., and Lewis, S. E. (2000) Genome annotation assessment in Drosophila melanogaster. Genome Research 10:483–501.CrossRefGoogle ScholarPubMed
Reichl W. and Ruske G. (1995) Discriminative training for continuous speech recognition. In Proceedings of the 4th European Conference on Speech Communication and Technology, pp. 537–540.
Rissanen, J. (1978) Modeling by shortest data description. Automatica 14:465–471.CrossRefGoogle Scholar
Ristad E. S. and Thomas R. G. (1997) Hierarchical non-emitting Markov models. In Proceedings of the 35th Annual Meeting of the Association for Computational Linguistics and 8th Conference of the European Chapter of the Association for Computational Linguistics.
Rivas, E. and Eddy, S. R. (1999) A dynamic programming algorithm for RNA structure prediction including pseudoknots. Journal of Molecular Biology 285:2053–2068.CrossRefGoogle ScholarPubMed
Rivas, E. and Eddy, S. R. (2000) Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics 16:583–605.CrossRefGoogle Scholar
Rivas, E. and Eddy, S. R. (2001) Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics 2:8.CrossRefGoogle ScholarPubMed
Rombauts, S., Florquin, K., Lescot, M., Marchasl, K., Rouzé, P., and Peer, Y. (2003) Computational approaches to identify promoters and cis-regulatory elements in plant genomes. Plant Physiology 132:1162–1176.CrossRefGoogle ScholarPubMed
Rosenblatt, F. (1958) The Perceptron: a probabilistic model for information storage and organization in the brain. Psychological Review 65:386–408.CrossRefGoogle Scholar
Roth V. and Steinhage V. (1999) Nonlinear discriminant analysis using kernel functions. In Proceedings of the 12th International Conference on Advances in Neural Information Processing Systems, pp. 568–574.
Saetrom, P, Sneve, R., Kristiansen, K. I., Sn⊘ve, O. J., Grünfeld, T., Rognes, T., and Seeberg, E. (2005) Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming. Nucleic Acids Research 33:3263–3270.CrossRefGoogle ScholarPubMed
Saeys Y. (2004) Feature selection for classification of nucleic acid sequences. Ph.D. thesis, University of Ghent, Belgium.
Saeys, Y., Degroeve, S., Aeyels, D., Rouzé, P., and Peer, Y. (2004) Feature selection for splice site prediction: a new method using EDA-based feature ranking. BMC Bioinformatics 5:64.CrossRefGoogle ScholarPubMed
Saitou, N. and Nei, M. (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Molecular Biology and Evolution 4:406–425.Google ScholarPubMed
Sakai M., Yoneda M., and Hase H. (1998) A new robust quadratic discriminant function. In Proceedings of the 14th International Conference on Pattern Recognition, pp. 99–102.CrossRef
Salzberg, S. L. (1999) On comparing classifiers: a critique of current research and methods. Data Mining and Knowledge Discovery 1:1–12.Google Scholar
Salzberg, S. L., Delcher, A. L., Kasif, S., and White, O. (1998a) Microbial gene identification using interpolated Markov models. Nucleic Acids Research 26:544–548.CrossRefGoogle Scholar
Salzberg, S. L., Pertea, M., Delcher, A. L., Gardner, M. J., and Tettelin, H. (1998b) Interpolated Markov models for eukaryotic gene finding. Genomics 59:24–31.CrossRefGoogle Scholar
Schadt, E. and Lange, K. (2002) Codon and rate variation models in molecular phylogeny. Molecular Biology and Evolution 19:1534–1549.CrossRefGoogle ScholarPubMed
Schlüter, R., Macherey, W., Müller B., , and Ney, H. (2001) Comparison of discriminative training criteria and optimization methods for speech recognition. Speech Communication 34:287–310.CrossRefGoogle Scholar
Schultz J., Milpetz F., Bork P., and Ponting C. P. (1998) SMART, a simple modular architecture research tool: identification of signaling domains. Proceedings of the National Academy of Sciences of the USA 95:5857–5864.CrossRef
Schwartz R. and Chow Y.-L. (1990) The N-best algorithm: an efficient and exact procedure for finding the N most likely hypotheses. In Proceedings of the IEEE Conference on Aconstics, Speech, and Signal Processing, pp. 81–84.CrossRef
Seneff, S., Wang, C., and Burge, C. B. (2004) Gene structure prediction using an orthologous gene of known exon–intron structure. Applied Bioinformatics 3:81–90.CrossRefGoogle ScholarPubMed
Servant, F., Bru, C., Carre, S., Courcelle, E., Gouzy, J., Peyruc, D., and Kahn, D. (2002) ProDom: automated clustering of homologous domains. Briefings in Bioinformatics 3:246–251.CrossRefGoogle ScholarPubMed
Shannon, C. E. (1948) A mathematical theory of communication. Bell System Technical Journal 27:379–423, 623–656.CrossRefGoogle Scholar
Shmatkov, A. M., Melikyan, A. A., Chernousko, F. L., and Borodovsky, M. (1999) Finding prokyarotic genes by the “frame-by-frame” algorithm: targeting gene starts and overlapping genes. Bioinformatics 15:874–886.CrossRefGoogle ScholarPubMed
Siepel, A. and Haussler, D. (2004a) Combining phylogenetic and hidden Markov models in biosequence analysis. Journal of Computational Biology 11:413–428.CrossRefGoogle Scholar
Siepel A. and Haussler D. (2004b) Computational identification of evolutionarily conserved exons. In Research in Computational Molecular Biology (RECOMB'04), pp. 277–286.
Siepel, A. and Haussler, D. (2004c) Phylogenetic estimation of context-dependent substitution rates by maximum likelihood. Molecular Biology and Evolution 21:468–488.CrossRefGoogle Scholar
Siepel, A. and Haussler, D. (2005) Phylogenetic hidden Markov models. In Nielsen, R. (ed.) Statistical Methods in Molecular Evolution, pp. 1034–1050. New York: Springer.Google Scholar
Simonoff, J. S. (1996) Smoothing Methods in Statistics. New York: Springer.CrossRefGoogle Scholar
Sinha, S., Nimwegen, E., and Siggia, E. D. (2003) A probabilistic method to detect regulatory modules. Bioinformatics 19:i292–i301.CrossRefGoogle ScholarPubMed
Smit A. F. A., and Green P. (1996) RepeatMasker. Available online at http://ftp.genome.waschington.edu/RM/ RepeatMasker.html/
Smith, T. F. and Waterman, M. S. (1981) Identification of common molecular subsequences. Journal of Molecular Biology 147:195–197.CrossRefGoogle ScholarPubMed
Sneath, P. H. A. and Sokal, R. R. (1973) Numerical Taxonomy. San Francisco, CA: W. H. Freeman.Google Scholar
Snyder E. E. (1994) Identification of protein coding regions in genomic DNA. Ph.D. thesis, University of Colorado, Boulder, CO.
Snyder, E. E. and Stormo, G. D. (1993) Identification of coding regions in genomic DNA sequences: an application of dynamic programming and neural networks. Nucleic Acids Research 21:607–613.CrossRefGoogle ScholarPubMed
Sokal, R. R. and Rohlf, F. J. (1995) Biometry: The Principles and Practice of Statistics in Biological Research. New York: W. H. Freeman.Google Scholar
Solovyev, V. V. and Shahmuradov, I. A. (2003) PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Research 31:3540–3545.CrossRefGoogle ScholarPubMed
Solovyev V. V., Salamov A. A., and Lawrence C. B. (1995) Identification of human gene structure using linear discriminant functions and dynamic programming. In Proceedings of the 3rd International Conference on Intelligent Systems for Molecular Biology, pp. 367–375.
Sonnenburg S. (2002) New methods for splice site recognition. Diploma thesis, Humboldt University, Berlin, Germany.
Sonnenburg S., Zien A., and Rätsch G. (2006) ARTS: accurate recognition of transcription starts in human. In Proceedings of the 14th International Conference on Intelligent Systems for Molecular Biology, pp. 472–480.CrossRef
Sorek, R., Ast, G., and Graur, D. (2002) Alu-containing exons are alternatively spliced. Genome Research 12:1060–1067.CrossRefGoogle ScholarPubMed
Staden, R. (1984) Computer methods to locate signals in nucleic acid sequences. Nucleic Acids Research 12:505–519.CrossRefGoogle ScholarPubMed
Stajich, J. E., Block, D., Boulez, K., Brenner, S. E., Chervitz, S. A., Dagdigian, C., Fuellen, G., Gilbert, J. G., Korf, I., Lapp, H., Lehvaslaiho, H., Matsalla, C., Mungall, C. J., Osborne, B. I., Pocock, M. R., Schattner, P., Senger, M., Stein, L. D., Stupka, E., Wilkinson, M. D., and Birney, E. (2002) The Bioperl toolkit: perl modules for the life sciences. Genome Research 12:1611–1618.CrossRefGoogle ScholarPubMed
Stanke, M. and Waack, S. (2003) Gene prediction with a hidden Markov model and a new intron submodel. Bioinformatics 19:II215–II225.CrossRefGoogle Scholar
Stein, L. (2001) Genome annotation: from sequence to biology. Nature Reviews: Genetics 2:493–503.CrossRefGoogle ScholarPubMed
Stormo G. D. and Haussler D. (1994) Optimally parsing a sequence into different classes based on multiple types of evidence. In Proceedings of the 2nd International Conference on Intelligent Systems for Molecular Biology, pp. 369–375.
Suzek, B. E., Ermolaeva, M. D., Schreiber, M., and Salzberg, S. L. (2001) A probabilistic method for identifying start codons in bacterial genomes. Bioinformatics 17:1123–1130.CrossRefGoogle ScholarPubMed
Tikhonov, A. N. (1963) Solution of incorrectly formulated problems and the regularization method. Soviet Mathematics, Doklady 4:1035–1038.Google Scholar
Tipping, M. E. (2001) Sparse Bayesian learning and the relevance vector machine. Journal of Machine Learning Research 1:211–244.Google Scholar
Tong S. and Koller D. (2000) Restricted Bayes optimal classifiers. In Proceedings of the 17th National Conference on Artificial Intelligence, pp. 658–664.
Toutanova K., Mitchell M., and Manning C. D. (2003) Optimizing local probability models for statistical parsing. In Proceedings of the 14th European Conference on Machine Learning, pp. 409–420.CrossRef
Tveter, D. (1998) The Pattern Recognition Basis of Artificial Intelligence. Indianapolis, IN: Wiley-IEEE Computer Society Press.Google Scholar
Uberbacher, E. C. and Mural, R. J. (1991) Locating protein coding regions in human DNA sequences using a multiple-sensor neural network approach. Proceedings of the National Academy of Sciences of the USA 88:11261–11265.CrossRefGoogle ScholarPubMed
Usuka, J., Zhu, W., and Brendel, V. (2000) Optimal spliced alignment of homologous cDNA to a genomic DNA template. Bioinformatics 16:203–224.CrossRefGoogle ScholarPubMed
Vapnik, V. (1998) Statistical Learning Theory. New York: John Wiley.Google Scholar
Venter, J. C., Smith, H. O., and Hood, L. (1996) A new strategy for genome sequencing. Nature 381:364–366.CrossRefGoogle ScholarPubMed
Venter, J. C.Adams, M. D., Myers, E. W., Li, P. W., Mural, R. J., Sutton, G. G., Smith, H. O., Yandell, M.Evans, C. A., Holt, R. A., Gocayne, J. D., Amanatides, P.Ballew, R. M., Huson, D. H., Wortman, J. R., Zhang, Q.Kodira, C. D., Zheng, X. H., Chen, L., Skupski, M., Subramanian, G.Thomas, P. D., Zhang, J., Gabor Miklos G. L., Nelson, C., Broder, S.Clark, A. G., Nadeau, J.McKusick, V. A., Zinder, N.Levine, A. J., Roberts, R. J., Simon, M., Slayman, C., Hunkapiller, M., Bolanos, R., Delcher, A., Dew, I., Fasulo, D., Flanigan, M., Florea, L., Halpern, A., Hannenhalli, S., Kravitz, S., Levy, S., Mobarry, C., Reinert, K., Remington, K., Abu-Threideh, J., Beasley, E., Biddick, K., Bonazzi, V., Brandon, R., Cargill, M., Chandramouliswaran, I., Charlab, R., Chaturvedi, K., Deng, Z., Di Francesco, V., Dunn, P., Eilbeck, K., Evangelista, C.Gabrielian, A. E., Gan, W., Ge, W., Gong, F., Gu, Z., Guan, P.Heiman, T. J., Higgins, M. E., Ji, R. R., Ke, Z.Ketchum, K. A., Lai, Z., Lei, Y., Li, J., Li, Z., Liang, Y., Lin, X., Lu, F.Merkulov, G. V., Milshina, N.Moore, H. M., Naik, A. K., Narayan, V. A., Neelam, B., Nusskern, D.Rusch, D. B., Salzberg, S., Shao, W., Shue, B., Sun, J., Wang, Z., Wang, A., Wang, X., Wang, J., Wei, M., Wides, R., Xiao, C., Yan, C., Yao, A., Ye, J., Zhan, M., Zhang, W., Zhang, H., Zhao, Q., Zheng, L., Zhong, F., Zhong, W., Zhu, S., Zhao, S., Gilbert, D., Baumhueter, S., Spier, G., Carter, C., Cravchik, A., Woodage, T., Ali, F., An, H., Awe, A., Baldwin, D., Baden, H., Barnstead, M., Barrow, I., Beeson, K., Busam, D., Carver, A., Center, A.Cheng, M. L., Curry, L., Danaher, S., Davenport, L., Desilets, R., Dietz, S., Dodson, K., Doup, L., Ferriera, S., Garg, N., Gluecksmann, A., Hart, B., Haynes, J., Haynes, C., Heiner, C., Hladun, S., Hostin, D., Houck, J., Howland, T., Ibegwam, C., Johnson, J., Kalush, F., Kline, L., Koduru, S., Love, A., Mann, F., May, D., McCawley, S., McIntosh, T., McMullen, I., Moy, L., Moy, M., Murphy, B., Nelson, K., Pfannkoch, C., Pratts, E., Puri, V., Qureshi, H., Reardon, M., Rodriguez, R.Rogers, Y. H., Romblad, D., Ruhfel, B., Scott, R., Sitter, C., Smallwood, M., Stewart, E., Strong, R., Suh, E., Thomas, R.Tint, N. N., Tse, S., Vech, C., Wang, G., Wetter, J., Williams, M., Williams, S., Windsor, S., Winn-Deen, E., Wolfe, K., Zaveri, J., Zaveri, K.Abril, J. F., Guigó R.Campbell, M. J., Sjolander, K. V., Karlak, B., Kejariwal, A., Mi, H., Lazareva, B., Hatton, T., Narechania, A., Diemer, K., Muruganujan, A., Guo, N., Sato, S., Bafna, V., Istrail, S., Lippert, R., Schwartz, R., Walenz, B., Yooseph, S., Allen, D., Basu, A., Baxendale, J., Blick, L., Caminha, M., Carnes-Stine, J., Caulk, P.Chiang, Y. H., Coyne, M., Dahlke, C., Mays, A., Dombroski, M., Donnelly, M., Ely, D., Esparham, S., Fosler, C., Gire, H., Glanowski, S., Glasser, K., Glodek, A., Gorokhov, M., Graham, K., Gropman, B., Harris, M., Heil, J., Henderson, S., Hoover, J., Jennings, D., Jordan, C., Jordan, J., Kasha, J., Kagan, L., Kraft, C., Levitsky, A., Lewis, M., Liu, X., Lopez, J., Ma, D.Majoros, W. H., McDaniel, J., Murphy, S., Newman, M., Nguyen, N., Nguyen, T., Nodell, M., Pan, S., Peck, J., Peterson, M., Rowe, W., Sanders, R., Scott, J., Simpson, M., Smith, T., Sprague, A., Stockwell, T., Turner, R., Venter, E., Wang, M., Wen, M., Wu, D., Wu, M., Xia, A., Zandieh, A., and Zhu, X. (2001) The sequence of the human genome. Science 291:1304–1351.CrossRefGoogle ScholarPubMed
Voorhees, E. M. (1986) Implementing agglomerative hierarchical clustering algorithms for use in document retrieval. Information Processing and Management 22:465–476.CrossRefGoogle Scholar
Vinson, J., DeCaprio, D., Luoma, S., and Galagan, J. E. (2006) Gene prediction using conditional random fields. In: The Biology of Genomes, Cold Spring Harbor Laboratory, New York, May 10–14, 2006 (abstract).Google Scholar
Viterbi, A. (1967) Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory 13:260–269.CrossRefGoogle Scholar
Hippel, P. T. (2005) Mean, median, and skew: correcting a textbook rule. Journal of Statistics Education 13.Google Scholar
Wain, H. M., Lovering, R. C., Bruford, E. A., Lush, M. J., Wright, M. W., and Povey, S. (2002) Guidelines for human gene nomenclature. Genomics 79:464–470.CrossRefGoogle ScholarPubMed
Watson, J. D. and Crick, FHC. (1953) Molecular structure of nucleic acids. Nature 4356:737–738.CrossRefGoogle Scholar
Wheelan, S. J., Church, D. M., and Ostell, J. M. (2001) Spidey: a tool for mRNA-to-genomic alignments. Genome Research 11:1952–1957.Google ScholarPubMed
Wheeler, D. L., Barrett, T., Benson, D. A., Bryant, S. H., Canese, K., Church, D. M., DiCuccio, M., Edgar, R., Federhen, S., Helmberg, W., Kenton, D. L., Khovayko, O., Lipman, D. J., Madden, T. L., Maglott, D. R., Ostell, J., Pontius, J. U., Pruitt, K. D., Schuler, G. D., Schriml, L. M., Sequeira, E., Sherry, S. T., Sirotkin, K., Starchenko, G., Suzek, T. O., Tatusov, R., Tatusova, T. A., Wagner, L., and Yaschenko, E. (2005) Database resources of the National Center for Biotechnology Information. Nucleic Acids Research 33:D39–D45.CrossRefGoogle ScholarPubMed
Wiehe, T., Gebauer-Jung, S., Mitchell-Olds, T., and Guigó, R. (2001) SGP-1: prediction and validation of homologous genes based on sequence alignments. Genome Research 9:1574–1583.CrossRefGoogle Scholar
Wingender, E., Kel, A. E., Kel, O. V., Karas, H., Heinemeyer, T., Dietze, P., Knuppel, R., Romaschenko, A. G., and Kolchanov, N. A. (1997) TRANSFAC, TRRD and COMPEL: towards a federated database system on transcriptional regulation. Nucleic Acids Research 25:265–268.CrossRefGoogle ScholarPubMed
Wojtowicz, W. M., Flanagan, J. J., Millard, S. S., Zipursky, S. L., and Clemens, J. C. (2004) Alternative splicing of Drosophila Dscam generates axon guidance receptors that exhibit isoform-specific homophilic binding. Cell 118:619–633.CrossRefGoogle ScholarPubMed
Wortman, J. R., Haas, B. J., Hannick, L. I., Smith, R. K., Maiti, R., Ronning, C. M., Chan, A. P., Yu, C., Ayele, M., Whitelaw, C. A., White, O. R., and Town, C. D. (2003) Annotation of the Arabidopsis genome. Plant Physiology 132:461–468.CrossRefGoogle ScholarPubMed
Wu, C. H., , Yeh L.-S. L., Guang, H., Arminski, L., Castro-Alvear, J, Chen, Y., Hu, Z.-Z., Ledley, R. S., Kourtesis, P., Suzek, B. E., Vinayaka, C. R., Zhang, J., and Barker, W. C. (2003) The Protein Information Resource. Nucleic Acids Research 31:345–347.CrossRefGoogle ScholarPubMed
Xu Y. and Uberbacher E. C. (1996) Gene prediction by pattern recognition and homology search. In Proceedings of the 4th International Conference on Intelligent Systems for Molecular Biology, pp. 242–256.
Yan, J. and Marr, T. G. (2005) Computational analysis of 3′-ends of ESTs shows four classes of alternative polyadenylation in human, mouse, and rat. Genome Research 15:369–375.CrossRefGoogle ScholarPubMed
Yang, Z. (1994) Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39:306–314.CrossRefGoogle ScholarPubMed
Yeh, R.-F., Lim, L. P., and Burge, C. B. (2001) Computational inference of homologous gene structures in the human genome. Genome Research 11:803–809.CrossRefGoogle ScholarPubMed
Yeo, G. and Burge, C. B. (2004) Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. Journal of Computational Biology 11:377–394.CrossRefGoogle ScholarPubMed
Yeo, G. W., Nostrand, E., Holste, D., Poggio, T., and Burge, C. B. (2005) Identification and analysis of alternative splicing events conserved in human and mouse. Proceedings of the National Academy of Sciences of the USA 102:2850–2855.CrossRefGoogle Scholar
Younger, D. H. (1967) Recognition and parsing of context-free languages in time n3. Information and Control 10:189–208.CrossRefGoogle Scholar
Yu, P., Ma, D., and Xu, M. (2005) Nested genes in the human genome. Genomics 86:414–422.CrossRefGoogle ScholarPubMed
Zar, J. H. (1996) Biostatistical Analysis, 3rd edn. Englewood Cliffs, NJ: Prentice-Hall.Google Scholar
Zhang M. Q. (1997) Identification of protein coding regions in the human genome by quadratic discriminant analysis. Proceedings of the National Academy of Sciences of the USA 94:565–568.CrossRef
Zhang M. Q. (2003) Prediction, annotation, and analysis of human promoters. Cold Spring Harbor Laboratory Symposium in Quantitative Biology68:217–225.CrossRef
Zhang, M. Q. and Marr, T. G. (1993) A weight array method for splicing signal analysis. Computer Applications in the Biosciences 9:499–509.Google ScholarPubMed
Zhang, H., Hu, J., Recce, M., and Tian, B. (2005) PolyA_DB: a database for mammalian mRNA polyadenylation. Nucleic Acids Research 33:D116–D120.CrossRefGoogle ScholarPubMed
Zhang, L., Pavlovic, V., Cantor, C. R., and Kasif, S. (2003) Human–mouse gene identification by comparative evidence integration and evolutionary analysis. Genome Research 13:1190–1202.CrossRefGoogle ScholarPubMed
Zhao, J., Hyman, L., and Moore, C. (1999) Formation of mRNA 3′ ends in eukaryotes: mechanism, regulation, and interrelationships with other steps in mRNA synthesis. Microbiology and Molecular Biology Reviews 63:405–445.Google ScholarPubMed
Zien, A., Rätsch, G., Mika, S., Scholkopf, B., Lengauer, T., and Muller, K.-R. (2000) Engineering support vector machine kernels that recognize translation initiation sites. Bioinformatics 16:799–807.CrossRefGoogle ScholarPubMed
Zuker, M.Mathews, D. H., and Turner D. H. (1999) Algorithms and thermodynamics for RNA secondary structure prediction: a practical guide. In Barciszewski, J. and Clark, B. F. C. (eds.) RNA Biochemistry and Biotechnology, pp. 11–43. New York: Kluwer.Google Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

  • References
  • William H. Majoros, Duke University, North Carolina
  • Book: Methods for Computational Gene Prediction
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811135.016
Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

  • References
  • William H. Majoros, Duke University, North Carolina
  • Book: Methods for Computational Gene Prediction
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811135.016
Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

  • References
  • William H. Majoros, Duke University, North Carolina
  • Book: Methods for Computational Gene Prediction
  • Online publication: 05 June 2012
  • Chapter DOI: https://doi.org/10.1017/CBO9780511811135.016
Available formats
×