Skip to main content Accessibility help
×
Home
  • Print publication year: 2014
  • Online publication date: December 2014

14 - Machine learning and natural language processing

Summary

… one naturally wonders if the problem of translation could conceivably be treated as a problem in cryptography. When I look at an article in Russian, I say “This is really written in English, but it has been coded in some strange symbols.”

Warren Weaver

Ideas of probability: The frequentists and the Bayesians

We are all familiar with the idea that a fair coin has an equal chance of coming down as heads or tails when tossed. Mathematicians say that the coin has a probability of 0.5 to be heads and 0.5 to be tails. Because heads or tails are the only possible outcomes, the probability for either heads or tails must add up to one. A coin toss is an example of physical probability, probability that occurs in a physical process, such as rolling a pair of dice or the decay of a radioactive atom. Physical probability means that in such systems, any given event, such as the dice landing on snake eyes, tends to occur at a persistent rate or relative frequency in a long run of trials. We are also familiar with the idea of probabilities as a result of repeated experiments or measurements. When we make repeated measurements of some quantity, we do not get the same answer each time because there may be small random errors for each measurement. Given a set of measurements, classical or frequentist statisticians have developed a powerful collection of statistical tools to estimate the most probable value of the variable and to give an indication of its likely error.