Projects

Tandy Warnow

doi:10.1017/9781316882313.017

Introduction

There are three types of projects in this collection: short projects, long projects, and projects that involve the development of novel methods. Each project requires data analysis, either on real or simulated data, and also writing. Therefore, even the short projects will require about a week for completion.

The main purpose of the short projects is to familiarize the student with the process of computing and interpreting alignments and trees on datasets. Because the data analysis part of these projects should be fast to complete, they are focused on relatively small nucleotide datasets. If the student has access to sufficient computational resources, then analyses of larger datasets or amino acid datasets are possible. Each short project also asks the student to explore the impact of method choice (i.e., alignment method or tree estimation method) or dataset on the resultant tree, typically using visualization tools.

The long projects build on the short projects, but do more exploration of the impact of method choice (for alignment estimation or tree estimation) or dataset on phylogeny estimation. Some of these projects examine scalability of methods to large datasets, and so will require substantial computational resources. As the student will learn, the degree to which the method selection impacts the final phylogeny can depend on the properties of the data, such as number of sequences, number of sites (i.e., sequence length), rate of evolution, percentage of missing data, etc. The use of both biological and simulated data will help the students evaluate the impact of the different factors on the final outcomes.

The projects aimed at novel method development are likely to be the most difficult, and success in these projects will probably require substantial effort beyond the period of the course. However, a student who wishes to do a novel method development project is usually best served by starting with a long project to identify the competing methods and select datasets that are best able to differentiate between methods.

Final projects for the course are typically long projects rather than novel method development projects, and are focused on comparisons of leading computational methods on simulated or biological datasets, with an eye toward assessing the relative performance of these methods, and gaining insight into the conditions that impact each method.

Book contents

Appendix D - Projects

Summary

Access options

Book contents

Appendix D - Projects

Summary

Access options

Save book to Kindle

Save book to Dropbox

Save book to Google Drive