Skip to main content Accessibility help
×
Hostname: page-component-848d4c4894-x5gtn Total loading time: 0 Render date: 2024-04-30T10:53:05.252Z Has data issue: false hasContentIssue false

8 - Perception and computer vision

Published online by Cambridge University Press:  05 July 2014

Markus Vincze
Affiliation:
Technische Universität Wien
Sven Wachsmuth
Affiliation:
Bielefeld University
Gerhard Sagerer
Affiliation:
University of Bielefeld
Keith Frankish
Affiliation:
The Open University, Milton Keynes
William M. Ramsey
Affiliation:
University of Nevada, Las Vegas
Get access

Summary

The wish to build artificial and intelligent systems leads to the expectation that they will operate in our typical environments. Hence, the expectations on their perceptual capabilities are high. Perception refers to the process of becoming aware of the elements of the environment through physical sensation, which can include sensory input from the eyes, ears, nose, tongue, or skin. In this chapter we focus on visual perception, which is the dominant sense in humans and has been used from the first days of building artificial machines. Two early examples are Shakey, a mobile robot with range finder and camera to enable it to reason about its actions in a room with a few objects (Nilsson 1969), and FREDDY, a fixed robot with a binocular vision system controlling a two-finger hand (e.g., Barrow and Salter 1969).

The goal of computer vision is to understand the scene or features in images of the real world (Ballard and Brown 1982; Forsyth and Ponce 2011). Important means to achieve this goal are the techniques of image processing and pattern recognition (Duda and Hart 1973; Gonzales and Woods 2002). The analysis of images is complicated by the fact that one and the same object may present many different appearances to the camera depending on the illumination cast onto the object, the angle from which it is viewed, the shadows it casts, the specific camera used, whether object parts are occluded, and so forth. Nevertheless, today computer vision is sufficiently well advanced to detect specific objects and object categories in a variety of conditions, to enable an autonomous vehicle to drive at moderate speeds on open roads, to steer a mobile robot through a suite of offices, and to observe and to understand human activities.

Type
Chapter
Information
Publisher: Cambridge University Press
Print publication year: 2014

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Englewood Cliffs, NJ: Prentice Hall. The basic book on methods in computer vision. Available online: .Google Scholar
Dickinson, S. J., Leonardis, A., Schiele, B., and Tarr, M. J. (2009). Object Cat-egorization: Computer and Human Vision Perspectives. Cambridge University Press. Excellent overview of approaches to object recognition, including a historical perspective. A must to get started in this direction.CrossRefGoogle Scholar
Forsyth, D. A. and Ponce, J. (2011). Computer Vision: A Modern Approach (2nd edn.). Upper Saddle River, NJ: Prentice Hall. A broad collection of computer vision techniques that is a very good reference for the advanced study of computer vision.Google Scholar
Hartley, R. and Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd edn.). Cambridge University Press. Provides deep coverage of geometrical aspects in computer vision for the advanced reader.Google Scholar
Kragic, D. and Vincze, M. (2009). Vision for robotics, Foundations and Trends in Robotics, 1: 1–78. An overview of the specific needs of robotics to computer vision methods plus a survey of applications.CrossRefGoogle Scholar
Szeliski, R. (2010). Computer Vision: Algorithms and Applications, London: Springer. An excellent textbook for the introduction and more in-depth study of computer vision. It has an emphasis on techniques that combine computer vision and graphics, but covers also modern techniques for object recognition, segmentation, and motion estimation. Available on-line: Google Scholar
Finally, two great open-source collections of vision methods are openCV () and the Point Cloud Library ().
Aloimonos, Y. (1993). Active Perception. Hillsdale, NJ: Lawrence Erlbaum.Google Scholar
Asfour, T., Azad, P., Vahrenkamp, N., et al. (2008). Toward humanoid manipulation in human-centred environments, Robotics and Autonomous Systems 56: 54–65.CrossRefGoogle Scholar
Bajcsy, R. (1988). Active perception, Proceedings of the IEEE, 76: 996–1005.CrossRefGoogle Scholar
Ballard, D. H. (1981). Generalizing the Hough transform to detect arbitrary shapes, Pattern Recognition, 13: 111–22.CrossRefGoogle Scholar
Ballard, D. H. and Brown, C. M. (1982). Computer Vision. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
Barrow, H. G. and Salter, S. H. (1969). Design of low-cost equipment for cognitive robot research, in Meltzer, B. and Michie, D. (eds.), Machine Intelligence 5 (pp. 555–66). Edinburgh University Press.Google Scholar
Binford, T. (1971). Visual perception by a computer, in Proceedings of the IEEE Conference on Systems and Control (pp. 116–23). IEEE.Google Scholar
Bobick, A. F., Intille, S. S., Davis, J. W., et al. (1999). The kidsroom: A perceptually-based interactive and immersive story environment, PRESENCE: Teleoperators and Virtual Environments, 8: 369–93.CrossRefGoogle Scholar
Bolt, R. A. (1980). “Put-that-there”: Voice and gesture at the graphics interface, ACM SIGGRAPH Computer Graphics, 14: 262–70.CrossRefGoogle Scholar
Breazeal, C. and Scassellati, B. (2000). Infant-like social interactions between a robot and a human caregiver, Adaptive Behavior, 8: 49–74.CrossRefGoogle Scholar
Brooks, R. (1983). Model-based 3D interpretation of 2D images, IEEE Transactions on Pattern Analysis and Machine Intelligence, 5: 140–50.CrossRefGoogle Scholar
Buxton, H. (2003). Learning and understanding dynamic scene activity: A review, Vision Computing, 21: 125–36.CrossRefGoogle Scholar
Chaumette, F. and Hutchinson, S. (2006). Visual servo control I: Basic approaches, IEEE Robotics and Automation Magazine, 13(4): 82–90.CrossRefGoogle Scholar
Crowley, J. L. and Christensen, H. I. (eds.) (1995) Vision as Process: Basic Research on Computer Vision Systems. Berlin: Springer.CrossRefGoogle Scholar
Crowley, J. L., Coutaz, J., and Bérard, F. (2000). Perceptual user interfaces: Things that see, Communications of the ACM, 43(3): 54–64.CrossRefGoogle Scholar
Crowley, J. L., Coutaz, J., Rey, G., and Reignier, P. (2002). Perceptual components for context aware computing, in Borriello, G. and Holmquist, L. E. (eds.), UbiComp 2002: Ubiquitous Computing (Lecture Notes in Computer Science 2498) (pp. 117–34). Berlin: Springer.CrossRefGoogle Scholar
Cupillard, F., Bremond, F., and Thonnat, M. (2003). Behaviour recognition for individuals, groups of people and crowds, IEE Symposium on Intelligent Distributed Surveillance Systems, 7: 1–5.Google Scholar
Dickmanns, E. D. (2007). Dynamic Vision for Perception and Control of Motion. London: Springer.Google Scholar
Duda, R. and Hart, P. (1973). Pattern Classification and Scene Analysis. New York: Wiley.Google Scholar
Forsyth, D. A. and Ponce, J. (2011). Computer Vision: A Modern Approach (2nd edn.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
Gonzales, R. C. and Woods, R. E. (2002). Digital Image Processing (2nd edn.). Upper Saddle River, NJ: Prentice Hall.Google Scholar
Hartley, R. and Zisserman, A. (2003). Multiple View Geometry in Computer Vision (2nd edn.). Cambridge University Press.Google Scholar
Hoiem, D., Efros, A. A., and Hebert, M. (2006). Putting objects in perspective, in Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition (CVPR), (pp. 2137–44). IEEE.Google Scholar
Huttenlocher, D. P. and Ullman, S. (1990). Recognizing solid objects by alignment with an image, International Journal of Computer Vision, 5: 195–212.CrossRefGoogle Scholar
Jaklic, A., Leonardis, A., and Solina, F. (2000). Segmentation and Recovery of Superquadrics. Dordrecht: Kluwer Academic Publishers.CrossRefGoogle Scholar
Kisacanin, B., Pavlovic, V., and Huang, T. S. (2005). Real-Time Vision for Human–Computer Interaction. New York: Springer.CrossRefGoogle Scholar
Koenderink, J. J. (1987). An internal representation for solid shape based on the topological properties of the apparent contour, in Richards, W. and Ullman, S. (eds.), Image understanding 1985–86 (pp. 257–85). Norwood, NJ: Ablex.Google Scholar
Kragic, D. and Vincze, M. (2009). Vision for robotics, Foundations and Trends in Robotics, 1: 1–78.CrossRefGoogle Scholar
Lowe, D. G. (2004). Distinctive image features from scale-invariant keypoints, International Journal of Computer Vision, 60: 91–110.CrossRefGoogle Scholar
Marr, D. (1982). Vision. San Francisco, CA: W. H. Freeman.Google Scholar
Moore, D. J., Essa, I. A., and Hayes, M. H. (1999). Exploiting human actions and object context for recognition tasks, in Proceedings of IEEE International Conference on Computer Vision (pp. 80–86). Corfu, Greece: IEEE.CrossRefGoogle Scholar
Mörwald, T., Prankl, J., Richtsfeld, A., Zillich, M., and Vincze, M. (2010). BLORT – The blocks world robotic vision toolbox, in Proceedings of the ICRA 2010 Workshop on Best Practice in 3D Perception and Modeling for Mobile Manipulation.
Nilsson, N. J. (1969). Mobile automaton: An application of artificial intelligence techniques, Technical Note 40, AI Center, SRI International; also in Proceedings of the First International Joint Conference on Artificial Intelligence (pp. 509–20).
Pavlovic, V., Sharma, R., and Huang, T. S. (1997). Visual interpretation of hand gestures for human-computer interaction: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 19: 677–95.CrossRefGoogle Scholar
Piccardi, M. (2004). Background subtraction techniques: A review, IEEE International Conference on Systems, Man and Cybernetics, 4: 3099–104).Google Scholar
Sage, K. H., Howell, A. J. and Buxton, H. (2005). Recognition of action, activity and behaviour in the ActIPret project, KI 19(2): 36–39.Google Scholar
Schilit, B., Adams, N., and Want, R. (1994).Context aware computing applications, in Proceedings of the First International Workshop on Mobile Computing Systems and Applications (pp. 85–90).
Shipley, T. and Kellman, P. J. (eds.) (2001). From Fragments to Objects: Segmentation and Grouping in Vision. Amsterdam: Elsevier.Google Scholar
Solina, F. and Bajcsy, R. (1990). Recovery of parametric models from range images: The case for superquadrics with global deformations, IEEE Transactions on Pattern Analysis and Machine Intelligence, 12: 131–47.CrossRefGoogle Scholar
Strat, T. M. and Fischler, M. A. (1991). Context-based vision: Recognizing objects using information from both 2D and 3D imagery, IEEE Transactions on Pattern Analysis and Machine Intelligence, 13: 1050–65.CrossRefGoogle Scholar
Thirde, D., Borg, M., Ferryman, J., et al. (2006) A real-time scene understanding system for airport apron monitoring, in Fourth IEEE International Conference on Computer Vision Systems (ICVS’06) (p. 26).
Thrun, S., Burgard, W., and Fox, D. (2005). Probabilistic Robotics. Cambridge MA: MIT Press.Google Scholar
Valera, M. and Velastin, S. A. (2005). Intelligent distributed surveillance systems: A review, IEE Proceedings: Vision, Image and Signal Processing 152: 192–204.Google Scholar
Vincze, M. (2005). On the design and structure of artificial eyes for tracking tasks, Journal of Advanced Computational Intelligence and Intelligent Informatics, 9: 353–60.CrossRefGoogle Scholar
Wachsmuth, S., Wrede, S., and Hanheide, M. (2007) Coordinating interactive vision behaviors for cognitive assistance, Computer Vision and Image Understanding 108: 135–49.CrossRefGoogle Scholar
Wohlkinger, W. and Vincze, M. (2010). 3D object classification for mobile robots in home-environments using web-data, in IEEE 19th International Workshop on Robotics in Alpe-Adria-Danube Region (RAAD) (pp. 247–52).
Xiang, T. and Gong, S. G. (2006). Beyond tracking: Modelling activity and understanding behaviour, International Journal of Computer Vision, 67: 21–51.CrossRefGoogle Scholar

Save book to Kindle

To save this book to your Kindle, first ensure coreplatform@cambridge.org is added to your Approved Personal Document E-mail List under your Personal Document Settings on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part of your Kindle email address below. Find out more about saving to your Kindle.

Note you can select to save to either the @free.kindle.com or @kindle.com variations. ‘@free.kindle.com’ emails are free but can only be saved to your device when it is connected to wi-fi. ‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.

Find out more about the Kindle Personal Document Service.

Available formats
×

Save book to Dropbox

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Dropbox.

Available formats
×

Save book to Google Drive

To save content items to your account, please confirm that you agree to abide by our usage policies. If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account. Find out more about saving content to Google Drive.

Available formats
×