Book contents
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Prologue
- 2 A beginners’ guide
- 3 Python basics
- 4 Program control and logic
- 5 Functions
- 6 Files
- 7 Object orientation
- 8 Object data modelling
- 9 Mathematics
- 10 Coding tips
- 11 Biological sequences
- 12 Pairwise sequence alignments
- 13 Multiple-sequence alignments
- 14 Sequence variation and evolution
- 15 Macromolecular structures
- 16 Array data
- 17 High-throughput sequence analyses
- 18 Images
- 19 Signal processing
- 20 Databases
- 21 Probability
- 22 Statistics
- 23 Clustering and discrimination
- 24 Machine learning
- 25 Hard problems
- 26 Graphical interfaces
- 27 Improving speed
- Appendices
- Glossary
- Index
- Plate section
22 - Statistics
Published online by Cambridge University Press: 05 February 2015
- Frontmatter
- Contents
- Preface
- Acknowledgements
- 1 Prologue
- 2 A beginners’ guide
- 3 Python basics
- 4 Program control and logic
- 5 Functions
- 6 Files
- 7 Object orientation
- 8 Object data modelling
- 9 Mathematics
- 10 Coding tips
- 11 Biological sequences
- 12 Pairwise sequence alignments
- 13 Multiple-sequence alignments
- 14 Sequence variation and evolution
- 15 Macromolecular structures
- 16 Array data
- 17 High-throughput sequence analyses
- 18 Images
- 19 Signal processing
- 20 Databases
- 21 Probability
- 22 Statistics
- 23 Clustering and discrimination
- 24 Machine learning
- 25 Hard problems
- 26 Graphical interfaces
- 27 Improving speed
- Appendices
- Glossary
- Index
- Plate section
Summary
Statistical analyses
In this chapter we look at the analysis and interpretation of collections of data in a mathematical way. In order to understand the basics of statistics we will assume some familiarity with the basics of probability, as discussed in Chapter 21.
Generally when we gather numerical measurements we don’t get identical results, rather we get a spread of values. The underlying reason for this variation could be a natural variation in what we are measuring, an error in the way we make the measurements or, as is almost always the case, a combination of both of these. Statistics helps us to make sense of variations in numerical data and commonly we are asking the question whether what we measure is statistically significant, according to some prior hypothesis. Depending on the result this naturally then drives further investigations, based on a belief of a hypothesis being true or untrue. Statistics is a vast subject, so in this chapter we can only cover a few of the more important aspects that we either refer to elsewhere in this book or that are otherwise commonly used in biology.
Samples and significance
One of the key principles, which underpins most statistical analyses, is the idea that the data we collect contains a limited number of samples from some kind of underlying probability distribution. This probability distribution can be thought of as the mechanism by which the data values are generated, but naturally the data arises due to some physical process and by ascribing a probability distribution we are merely forming a mathematical model, which is often significantly simplified, to approximate the data-generation process.
- Type
- Chapter
- Information
- Python Programming for BiologyBioinformatics and Beyond, pp. 454 - 485Publisher: Cambridge University PressPrint publication year: 2015