The recognition of object categories has a rich history in computer vision. In the 1970s, generic object recognition systems sought to model and recognize objects based on their coarse, prototypical shape. These early systems employed complex 3-D models, which offered invariance to viewpoint (including image translation, rotation, and scale), articulation, occlusion, and minor within-class shape deformation. Despite powerful modeling paradigms, however, these early systems lacked the low- and intermediate-level segmentation, grouping, and abstraction machinery needed to recover prototypical shapes from real images of real objects. Over the next two decades, the recognition community began to back away from this “holy grail” of recognition, bringing new models closer to the image in an effort to reduce the representational gap between extractable image features and model features. During this time, the community migrated from the CAD-based vision era, in which exact 3-D geometry was specified, to the appearance-based vision era, in which exact 2-D photometry was specified (either globally, or locally at interest points). Almost in parallel, approaches to biological vision have followed a roughly similar path; that is, there has been a migration from CAD-inspired structural models comprised of 3-D parts, to image-based models preserving much of an object's input appearance, to, most recently, hybrid fragment-based models that rely on hierarchies of more localized image features.
Over this period, the recognition problem was sometimes reformulated from generic object recognition to exemplar recognition. For the first time, real object exemplars, with full texture and complex shape, could be recognized.