Published online by Cambridge University Press: 20 May 2010
Object representations for categorization tasks should be applicable for a wide range of objects, scalable to handle large numbers of object classes, and at the same time learnable from a few training samples. While such a scalable representation is still illusive today, it has been argued that such a representation should have at least the following properties: it should enable sharing of features (Torralba et al. 2007), it should combine generative models with discriminative models (Fritz et al. 2005; Jaakkola and Haussler 1999), and it should combine both local and global as well as appearanceand shape-based features (Leibe et al. 2005). Additionally, we argue that such object representations should be applicable both for unsupervised learning (e.g., visual object discovery) as well as supervised training (e.g., object detection). Therefore, we extend our previous efforts of hybrid modeling (Fritz et al. 2005) with ideas of unsupervised learning of generative decompositions to obtain an approach that integrates across different paradigms of modeling, representing, and learning of visual categories.
We present a novel method for the discovery and detection of visual object categories based on decompositions using topic models. The approach is capable of learning a compact and low-dimensional representation for multiple visual categories from multiple viewpoints without labeling of training instances. The learnt object components range from local structures over line segments to global silhouette-like descriptions. This representation can be used to discover object categories in a totally unsupervised fashion. Furthermore we employ the representation as the basis for building a supervised multicategory detection system making efficient use of training examples and outperforming pure features-based representations.