Hostname: page-component-848d4c4894-8bljj Total loading time: 0 Render date: 2024-07-01T06:56:54.196Z Has data issue: false hasContentIssue false

Data mining parasite genomes

Published online by Cambridge University Press:  12 May 2005

M. BERRIMAN
Affiliation:
Wellcome Trust Sanger Institute, The Wellcome Trust Genome Campus, Hinxton, CB10 ISA, UK

Abstract

The term ‘data mining’ can be used to describe any process where useful information is extracted from data with a large background of ‘noise’. In the context of a genome project, several stages involve data mining. Amongst the sequence data, ‘signals’ need to be detected that indicate the presence of interesting features. Often this involves differentiating between transcribed and non-transcribed bases to predict coding regions. After detection, defining the roles of these sequences involves sifting through multiple lines of evidence. If these roles are accurately reflected in genome annotation, they can be used by researchers to frame queries and interrogate the data further.

Type
Research Article
Copyright
© 2004 Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

REFERENCES

ALTSCHUL, S. F., GISH, W., MILLER, W., MYERS, E. W. & LIPMAN, D. J. ( 1990). Basic local alignment search tool. Journal of Molecular Biology 215, 403410.CrossRefGoogle Scholar
ASHBURNER, M., BALL, C. A., BLAKE, J. A., BOTSTEIN, D., BUTLER, H., CHERRY, J. M., DAVIS, A. P., DOLINSKI, K., DWIGHT, S. S., EPPIG, J. T., HARRIS, M. A., HILL, D. P., ISSEL-TARVER, L., KASARSKIS, A., LEWIS, S., MATESE, J. C., RICHARDSON, J. E., RINGWALD, M., RUBIN, G. M. & SHERLOCK, G. ( 2000). Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature Genetics 25, 2529.CrossRefGoogle Scholar
BERRIMAN, M. & RUTHERFORD, K. ( 2003). Annotation and visualisation of sequences using Artemis. Brief Bioinformatics 4, 124132.CrossRefGoogle Scholar
BUCHER, P. & BAIROCH, A. ( 1994). A generalized profile syntax for biomolecular sequence motifs and its function in automatic sequence interpretation. Proceedings of the International Conference on Intelligent Systems for Molecular Biology 2, 5361.Google Scholar
CAWLEY, S. E., WIRTH, A. I. & SPEED, T. P. ( 2001). Phat – a gene finding program for Plasmodium falciparum. Molecular and Biochemical Parasitology 118, 167174.CrossRefGoogle Scholar
EDDY, S. R. ( 1998). Profile hidden Markov models. Bioinformatics 14, 755763.CrossRefGoogle Scholar
GARDNER, M. J., HALL, N., FUNG, E., WHITE, O., BERRIMAN, M., HYMAN, R. W., CARLTON, J. M., PAIN, A., NELSON, K. E., BOWMAN, S., PAULSEN, I. T., JAMES, K., EISEN, J. A., RUTHERFORD, K., SALZBERG, S. L., CRAIG, A., KYES, S., CHAN, M. S., NENE, V., SHALLOM, S. J., SUH, B., PETERSON, J., ANGIUOLI, S., PERTEA, M., ALLEN, J., SELENGUT, J., HAFT, D., MATHER, M. W., VAIDYA, A. B., MARTIN, D. M., FAIRLAMB, A. H., FRAUNHOLZ, M. J., ROOS, D. S., RALPH, S. A., McFADDEN, G. I., CUMMINGS, L. M., SUBRAMANIAN, G. M., MUNGALL, C., VENTER, J. C., CARUCCI, D. J., HOFFMAN, S. L., NEWBOLD, C., DAVIS, R. W., FRASER, C. M. & BARRELL, B. ( 2002). Genome sequence of the human malaria parasite Plasmodium falciparum. Nature 419, 498511.CrossRefGoogle Scholar
HALL, N., PAIN, A., BERRIMAN, M., CHURCHER, C., HARRIS, B., HARRIS, D., MUNGALL, K., BOWMAN, S., ATKIN, R., BAKER, S., BARRON, A., BROOKS, K., BUCKEE, C. O., BURROWS, C., CHEREVACH, I., CHILLINGWORTH, C., CHILLINGWORTH, T., CHRISTODOULOU, Z., CLARK, L., CLARK, R., CORTON, C., CRONIN, A., DAVIES, R., DAVIS, P., DEAR, P., DEARDEN, F., DOGGETT, J., FELTWELL, T., GOBLE, A., GOODHEAD, I., GWILLIAM, R., HAMLIN, N., HANCE, Z., HARPER, D., HAUSER, H., HORNSBY, T., HOLROYD, S., HORROCKS, P., HUMPHRAY, S., JAGELS, K., JAMES, K. D., JOHNSON, D., KERHORNOU, A., KNIGHTS, A., KONFORTOV, B., KYES, S., LARKE, N., LAWSON, D., LENNARD, N., LINE, A., MADDISON, M., McLEAN, J., MOONEY, P., MOULE, S., MURPHY, L., OLIVER, K., ORMOND, D., PRICE, C., QUAIL, M. A., RABBINOWITSCH, E., RAJANDREAM, M. A., RUTTER, S., RUTHERFORD, K. M., SANDERS, M., SIMMONDS, M., SEEGER, K., SHARP, S., SMITH, R., SQUARES, R., SQUARES, S., STEVENS, K., TAYLOR, K., TIVEY, A., UNWIN, L., WHITEHEAD, S., WOODWARD, J., SULSTON, J. E., CRAIG, A., NEWBOLD, C. & BARRELL, B. G. ( 2002). Sequence of Plasmodium falciparum chromosomes 1, 3–9 and 13. Nature 419, 527531.CrossRefGoogle Scholar
HARRIS, M. A., CLARK, J., IRELAND, A., LOMAX, J., ASHBURNER, M., FOULGER, R., EILBECK, K., LEWIS, S., MARSHALL, B., MUNGALL, C., RICHTER, J., RUBIN, G. M., BLAKE, J. A., BULT, C., DOLAN, M., DRABKIN, H., EPPIG, J. T., HILL, D. P., NI, L., RINGWALD, M., BALAKRISHNAN, R., CHERRY, J. M., CHRISTIE, K. R., COSTANZO, M. C., DWIGHT, S. S., ENGEL, S., FISK, D. G., HIRSCHMAN, J. E., HONG, E. L., NASH, R. S., SETHURAMAN, A., THEESFELD, C. L., BOTSTEIN, D., DOLINSKI, K., FEIERBACH, B., BERARDINI, T., MUNDODI, S., RHEE, S. Y., APWEILER, R., BARRELL, D., CAMON, E., DIMMER, E., LEE, V., CHISHOLM, R., GAUDET, P., KIBBE, W., KISHORE, R., SCHWARZ, E. M., STERNBERG, P., GWINN, M., HANNICK, L., WORTMAN, J., BERRIMAN, M., WOOD, V., DE LA CRUZ, N., TONELLATO, P., JAISWAL, P., SEIGFRIED, T. & WHITE, R. ( 2004). The Gene Ontology (GO) database and informatics resource. Nucleic Acids Research 32, D258D261.Google Scholar
HERTZ-FOWLER, C. & PEACOCK, C. S. ( 2002). Introducing GeneDB: a generic database. Trends in Parasitology 18, 465467.CrossRefGoogle Scholar
HERTZ-FOWLER, C., PEACOCK, C. S., WOOD, V., ASLETT, M., KERHORNOU, A., MOONEY, P., TIVEY, A., BERRIMAN, M., HALL, N., RUTHERFORD, K., PARKHILL, J., IVENS, A. C., RAJANDREAM, M. A. & BARRELL, B. ( 2004). GeneDB: a resource for prokaryotic and eukaryotic organisms. Nucleic Acids Research 32, D339D343.CrossRefGoogle Scholar
KROGH, A. ( 1998). An Introduction to Hidden Markov Models for Biological Sequences. In Computational Methods in Molecular Biology (ed. S. L. Salzberg, D. B. Searls and S. Kasif), pp. 4563. Elsevier Amsterdam.CrossRef
MOUNT, D. W. ( 2001). Bioinformatics: Sequence and Genome Analysis. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York.
PASSARGE, E., HORSTHEMKE, B. & FARBER, R. A. ( 1999). Incorrect use of the term synteny. Nature Genetics 23, 387.CrossRefGoogle Scholar
PEARSON, W. R. & LIPMAN, D. J. ( 1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, USA 85, 24442448.CrossRefGoogle Scholar
PETERSON, D. S., MILLER, L. H. & WELLEMS, T. E. ( 1995). Isolation of multiple sequences from the Plasmodium falciparum genome that encode conserved domains homologous to those in erythrocyte-binding proteins. Proceedings of the National Academy of Sciences, USA 92, 71007104.CrossRefGoogle Scholar
RUTHERFORD, K., PARKHILL, J., CROOK, J., HORSNELL, T., RICE, P., RAJANDREAM, M. A. & BARRELL, B. ( 2000). Artemis: sequence visualization and annotation. Bioinformatics 16, 944945.CrossRefGoogle Scholar
SALZBERG, S. L., PERTEA, M., DELCHER, A. L., GARDNER, M. J. & TETTELIN, H. ( 1999). Interpolated Markov models for eukaryotic gene finding. Genomics 59, 2431.CrossRefGoogle Scholar
THOMPSON, J. D., HIGGINS, D. G. & GIBSON, T. J. ( 1994). CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Research 22, 46734680.CrossRefGoogle Scholar