Perception and computer vision

doi:10.1017/CBO9781139046855.012

8 - Perception and computer vision

Published online by Cambridge University Press: 05 July 2014

Markus Vincze ,

Sven Wachsmuth and

Gerhard Sagerer

Edited by

Keith Frankish and

William M. Ramsey

Show author details

Markus Vincze: Affiliation:
Technische Universität Wien
Sven Wachsmuth: Affiliation:
Bielefeld University
Gerhard Sagerer: Affiliation:
University of Bielefeld
Keith Frankish: Affiliation:
The Open University, Milton Keynes
William M. Ramsey: Affiliation:
University of Nevada, Las Vegas

Book contents

Get access

Summary

The wish to build artificial and intelligent systems leads to the expectation that they will operate in our typical environments. Hence, the expectations on their perceptual capabilities are high. Perception refers to the process of becoming aware of the elements of the environment through physical sensation, which can include sensory input from the eyes, ears, nose, tongue, or skin. In this chapter we focus on visual perception, which is the dominant sense in humans and has been used from the first days of building artificial machines. Two early examples are Shakey, a mobile robot with range finder and camera to enable it to reason about its actions in a room with a few objects (Nilsson 1969), and FREDDY, a fixed robot with a binocular vision system controlling a two-finger hand (e.g., Barrow and Salter 1969).

The goal of computer vision is to understand the scene or features in images of the real world (Ballard and Brown 1982; Forsyth and Ponce 2011). Important means to achieve this goal are the techniques of image processing and pattern recognition (Duda and Hart 1973; Gonzales and Woods 2002). The analysis of images is complicated by the fact that one and the same object may present many different appearances to the camera depending on the illumination cast onto the object, the angle from which it is viewed, the shadows it casts, the specific camera used, whether object parts are occluded, and so forth. Nevertheless, today computer vision is sufficiently well advanced to detect specific objects and object categories in a variety of conditions, to enable an autonomous vehicle to drive at moderate speeds on open roads, to steer a mobile robot through a suite of offices, and to observe and to understand human activities.

Keywords

visual servoing object recognition surveillance systems visual perception human behavior computer vision contextual scene

Type: Chapter
Information: The Cambridge Handbook of Artificial Intelligence , pp. 168 - 190

DOI: https://doi.org/10.1017/CBO9781139046855.012 [Opens in a new window]

Publisher: Cambridge University Press

Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Englewood Cliffs, NJ: Prentice Hall. The basic book on methods in computer vision. Available online: .Google Scholar

Dickinson, S. J., Leonardis, A., Schiele, B., and Tarr, M. J. (2009). Object Cat-egorization: Computer and Human Vision Perspectives. Cambridge University Press. Excellent overview of approaches to object recognition, including a historical perspective. A must to get started in this direction.CrossRef Google Scholar

Forsyth, D. A. and Ponce, J. (2011). Computer Vision: A Modern Approach (2nd edn.). Upper Saddle River, NJ: Prentice Hall. A broad collection of computer vision techniques that is a very good reference for the advanced study of computer vision.Google Scholar

Hartley, R. and Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd edn.). Cambridge University Press. Provides deep coverage of geometrical aspects in computer vision for the advanced reader.Google Scholar

Kragic, D. and Vincze, M. (2009). Vision for robotics, Foundations and Trends in Robotics, 1: 1–78. An overview of the specific needs of robotics to computer vision methods plus a survey of applications.CrossRef Google Scholar

Szeliski, R. (2010). Computer Vision: Algorithms and Applications, London: Springer. An excellent textbook for the introduction and more in-depth study of computer vision. It has an emphasis on techniques that combine computer vision and graphics, but covers also modern techniques for object recognition, segmentation, and motion estimation. Available on-line: Google Scholar

Finally, two great open-source collections of vision methods are openCV () and the Point Cloud Library ().

Aloimonos, Y. (1993). Active Perception. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar

Asfour, T., Azad, P., Vahrenkamp, N., et al. (2008). Toward humanoid manipulation in human-centred environments, Robotics and Autonomous Systems 56: 54–65.CrossRef Google Scholar

Bajcsy, R. (1988). Active perception, Proceedings of the IEEE, 76: 996–1005.CrossRef Google Scholar

Ballard, D. H. (1981). Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition, 13: 111–22.CrossRef Google Scholar

Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Englewood Cliffs, NJ: Prentice Hall.Google Scholar

Barrow, H. G. and Salter, S. H. (1969). Design of low-cost equipment for cognitive robot research, in Meltzer, B. and Michie, D. (eds.), Machine Intelligence 5 (pp. 555–66). Edinburgh University Press.Google Scholar

Binford, T. (1971). Visual perception by a computer, in Proceedings of the IEEE Conference on Systems and Control (pp. 116–23). IEEE.Google Scholar

Bobick, A. F., Intille, S. S., Davis, J. W., et al. (1999). The kidsroom: A perceptually-based interactive and immersive story environment, PRESENCE: Teleoperators and Virtual Environments, 8: 369–93.CrossRef Google Scholar

Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface, ACM SIGGRAPH Computer Graphics, 14: 262–70.CrossRef Google Scholar

Breazeal, C. and Scassellati, B. (2000). Infant-like social interactions between a robot and a human caregiver, Adaptive Behavior, 8: 49–74.CrossRef Google Scholar

Brooks, R. (1983). Model-based 3D interpretation of 2D images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 5: 140–50.CrossRef Google Scholar

Buxton, H. (2003). Learning and understanding dynamic scene activity: A review, Vision Computing, 21: 125–36.CrossRef Google Scholar

Chaumette, F. and Hutchinson, S. (2006). Visual servo control I: Basic approaches, IEEE Robotics and Automation Magazine, 13(4): 82–90.CrossRef Google Scholar

Crowley, J. L. and Christensen, H. I. (eds.) (1995) Vision as Process: Basic Research on Computer Vision Systems. Berlin: Springer.CrossRef Google Scholar

Crowley, J. L., Coutaz, J., and Bérard, F. (2000). Perceptual user interfaces: Things that see, Communications of the ACM, 43(3): 54–64.CrossRef Google Scholar

Crowley, J. L., Coutaz, J., Rey, G., and Reignier, P. (2002). Perceptual components for context aware computing, in Borriello, G. and Holmquist, L. E. (eds.), UbiComp 2002: Ubiquitous Computing (Lecture Notes in Computer Science 2498) (pp. 117–34). Berlin: Springer.CrossRef Google Scholar

Cupillard, F., Bremond, F., and Thonnat, M. (2003). Behaviour recognition for individuals, groups of people and crowds, IEE Symposium on Intelligent Distributed Surveillance Systems, 7: 1–5.Google Scholar

Dickmanns, E. D. (2007). Dynamic Vision for Perception and Control of Motion. London: Springer.Google Scholar

Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. New York: Wiley.Google Scholar

Forsyth, D. A. and Ponce, J. (2011). Computer Vision: A Modern Approach (2nd edn.). Upper Saddle River, NJ: Prentice Hall.Google Scholar

Gonzales, R. C. and Woods, R. E. (2002). Digital Image Processing (2nd edn.). Upper Saddle River, NJ: Prentice Hall.Google Scholar

Hartley, R. and Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd edn.). Cambridge University Press.Google Scholar

Hoiem, D., Efros, A. A., and Hebert, M. (2006). Putting objects in perspective, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2137–44). IEEE.Google Scholar

Huttenlocher, D. P. and Ullman, S. (1990). Recognizing solid objects by alignment with an image, International Journal of Computer Vision, 5: 195–212.CrossRef Google Scholar

Jaklic, A., Leonardis, A., and Solina, F. (2000). Segmentation and Recovery of Superquadrics. Dordrecht: Kluwer Academic Publishers.CrossRef Google Scholar

Kisacanin, B., Pavlovic, V., and Huang, T. S. (2005). Real-Time Vision for Human–Computer Interaction. New York: Springer.CrossRef Google Scholar

Koenderink, J. J. (1987). An internal representation for solid shape based on the topological properties of the apparent contour, in Richards, W. and Ullman, S. (eds.), Image understanding 1985–86 (pp. 257–85). Norwood, NJ: Ablex.Google Scholar

Kragic, D. and Vincze, M. (2009). Vision for robotics, Foundations and Trends in Robotics, 1: 1–78.CrossRef Google Scholar

Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60: 91–110.CrossRef Google Scholar

Marr, D. (1982). Vision. San Francisco, CA: W. H. Freeman.Google Scholar

Moore, D. J., Essa, I. A., and Hayes, M. H. (1999). Exploiting human actions and object context for recognition tasks, in Proceedings of IEEE International Conference on Computer Vision (pp. 80–86). Corfu, Greece: IEEE.CrossRef Google Scholar

Mörwald, T., Prankl, J., Richtsfeld, A., Zillich, M., and Vincze, M. (2010). BLORT – The blocks world robotic vision toolbox, in Proceedings of the ICRA 2010 Workshop on Best Practice in 3D Perception and Modeling for Mobile Manipulation.

Nilsson, N. J. (1969). Mobile automaton: An application of artificial intelligence techniques, Technical Note 40, AI Center, SRI International; also in Proceedings of the First International Joint Conference on Artificial Intelligence (pp. 509–20).

Pavlovic, V., Sharma, R., and Huang, T. S. (1997). Visual interpretation of hand gestures for human-computer interaction: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19: 677–95.CrossRef Google Scholar

Piccardi, M. (2004). Background subtraction techniques: A review, IEEE International Conference on Systems, Man and Cybernetics, 4: 3099–104).Google Scholar

Sage, K. H., Howell, A. J. and Buxton, H. (2005). Recognition of action, activity and behaviour in the ActIPret project, KI 19(2): 36–39.Google Scholar

Schilit, B., Adams, N., and Want, R. (1994).Context aware computing applications, in Proceedings of the First International Workshop on Mobile Computing Systems and Applications (pp. 85–90).

Shipley, T. and Kellman, P. J. (eds.) (2001). From Fragments to Objects: Segmentation and Grouping in Vision. Amsterdam: Elsevier.Google Scholar

Solina, F. and Bajcsy, R. (1990). Recovery of parametric models from range images: The case for superquadrics with global deformations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12: 131–47.CrossRef Google Scholar

Strat, T. M. and Fischler, M. A. (1991). Context-based vision: Recognizing objects using information from both 2D and 3D imagery, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13: 1050–65.CrossRef Google Scholar

Thirde, D., Borg, M., Ferryman, J., et al. (2006) A real-time scene understanding system for airport apron monitoring, in Fourth IEEE International Conference on Computer Vision Systems (ICVS’06) (p. 26).

Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic Robotics. Cambridge MA: MIT Press.Google Scholar

Valera, M. and Velastin, S. A. (2005). Intelligent distributed surveillance systems: A review, IEE Proceedings: Vision, Image and Signal Processing 152: 192–204.Google Scholar

Vincze, M. (2005). On the design and structure of artificial eyes for tracking tasks, Journal of Advanced Computational Intelligence and Intelligent Informatics, 9: 353–60.CrossRef Google Scholar

Wachsmuth, S., Wrede, S., and Hanheide, M. (2007) Coordinating interactive vision behaviors for cognitive assistance, Computer Vision and Image Understanding 108: 135–49.CrossRef Google Scholar

Wohlkinger, W. and Vincze, M. (2010). 3D object classification for mobile robots in home-environments using web-data, in IEEE 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD) (pp. 247–52).

Xiang, T. and Gong, S. G. (2006). Beyond tracking: Modelling activity and understanding behaviour, International Journal of Computer Vision, 67: 21–51.CrossRef Google Scholar

Book contents

8 - Perception and computer vision

Summary

Keywords

Access options

References

Save book to Kindle

Save book to Dropbox

Save book to Google Drive