In his Open Secrets article for The New Yorker, Malcom Gladwell (2007) reiterates the US national security expert, Gregory Treverton's distinction between a puzzle and a mystery. With a puzzle, additional information is required for it to be solved, whereas with a mystery, part of the problem is too much information. The primary focus of Gladwell's article is the Enron scandal of 2001, where Enron's assets were found to have been greatly inflated. Much of the information was in the public domain, although with so much information available, the disparate parts had not been pieced together. The vast quantity of information now available is widely recognized. As Eric Schmidt stated in August 2004: ‘There was 5 exabytes of information created between the dawn of civilization through 2003, but that much information is now created every two days and the pace is increasing’ (Kirkpatrick, 2010). It was also widely reported in 2010 that the world would pass an estimated 1.2 zettabytes of information, the equivalent of 75 billion iPads (Blake, 2010). Such numbers are barely comprehensible, although what is clear is that if this information is to be used to its full potential it is important that we find ways of reducing the need for human input in bringing together disparate information in useful ways.
This chapter looks at the methods that have evolved to deal with the increasing amount of information available online at both the document and the data level. Over the last two decades we have moved first from directories to search engines, and then more recently to the crowd-sourced solutions of tagging and tapping into the streams of our online ‘friends’. Such methods have helped to deal with much of the unstructured content available today, although there is a significant gap between what we can achieve online today and what could be achieved if the meaning of more of this information could be understood without the need for human interpretation.