Published online by Cambridge University Press: 20 May 2010
In Chapter 1, Dickinson analyzes the complex history of theoretical and computational vision. With some exceptions, the trend in recent decades is away from explicit structural representation and toward direct mapping of image features to semantic categories based on machine learning. The best-known formulations of the older, structural paradigm are those of Marr (Marr and Nishihara 1978) and Biederman (1987), although the central idea that objects are represented as configurations of parts has a long history (Barlow 1972; Binford 1971; Dickinson, Pentland, and Rosenfeld 1992; Hoffman and Richards 1984; Hubel and Wiesel 1959, 1968; Milner 1974; Palmer 1975; Selfridge 1959; Sutherland 1968). A configural representation would be carried by ensembles of processing units or neurons, each encoding the shape and relative position of a constituent part. This coding format is appealing because it solves three major problems in object vision. The first problem is the enormous dimensionality (on the order of 106) of retinal activity patterns. A signal of this complexity is too unwieldy to communicate between brain regions (owing to wiring constraints) or store in memory (owing to limited information capacity of synaptic weight patterns). Compression of this signal into a list of part specifications on the order of 101 to 102 would make communication and storage more practical. The second problem is the extremely variable mapping between retinal images and object identity. The same object can produce an infinity of very different retinal images depending on its position, orientation, lighting, partial occlusion, and other factors.