Selection of trajectory parameters for dynamic pouring tasks based on exploitation-driven updates of local metamodels

Joshua D. Langsfeld; Krishnanand N. Kaipa; Satyandra K. Gupta

doi:10.1017/S0263574717000212

Selection of trajectory parameters for dynamic pouring tasks based on exploitation-driven updates of local metamodels

Published online by Cambridge University Press: 08 May 2017

Joshua D. Langsfeld ,

Krishnanand N. Kaipa and

Satyandra K. Gupta

Show author details

Joshua D. Langsfeld: Affiliation:
Maryland Robotics Center, Institute for Systems Research, University of Maryland, College Park, MD, USA. E-mail: jdlangs@umd.edu
Krishnanand N. Kaipa: Affiliation:
Department of Mechanical and Aerospace Engineering, Old Dominion University, Norfolk, VA, USA. E-mail: kkaipa@odu.edu
Satyandra K. Gupta*: Affiliation:
Center for Advanced Manufacturing, Department of Aerospace and Mechanical Engineering, University of Southern California, Los Angeles, CA, USA
*: *Corresponding author. E-mail: guptask@usc.edu

Article contents

Summary
References

Get access

Rights & Permissions

Summary

We present an approach that allows a robot to generate trajectories to perform a set of instances of a task using few physical trials. Specifically, we address manipulation tasks which are highly challenging to simulate due to complex dynamics. Our approach allows a robot to create a model from initial exploratory experiments and subsequently improve it to find trajectory parameters to successfully perform a given task instance. First, in a model generation phase, local models are constructed in the vicinity of previously conducted experiments that explain both task function behavior and estimated divergence of the generated model from the true model when moving within the neighborhood of each experiment. Second, in an exploitation-driven updating phase, these generated models are used to guide parameter selection given a desired task outcome and the models are updated based on the actual outcome of the task execution. The local models are built within adaptively chosen neighborhoods, thereby allowing the algorithm to capture arbitrarily complex function landscapes. We first validate our approach by testing it on a synthetic non-linear function approximation problem, where we also analyze the benefit of the core approach features. We then show results with a physical robot performing a dynamic fluid pouring task. Real robot results reveal that the correct pouring parameters for a new pour volume can be learned quite rapidly, with a limited number of exploratory experiments.

Keywords

Trajectory generation Locally weighted learning Adaptive function approximation Robot dynamics Robot pouring

Type: Articles
Information: Robotica , Volume 36 , Issue 1 , January 2018 , pp. 141 - 166

DOI: https://doi.org/10.1017/S0263574717000212 [Opens in a new window]
Copyright: Copyright © Cambridge University Press 2017

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

References

1. Aboaf, E., Atkeson, C. G. and Reinkensmeyer, D. J., Task Level Robot Learning: Ball Throwing. Technical report, MIT, Cambridge, MA, 1987.CrossRef Google Scholar

2. Abu-Dakka, F. J., Valero, F. J., Suner, J. Luis and A, V., “Mata direct approach to solving trajectory planning problems using genetic algorithms with dynamics considerations in complex environments,” Robotica 33 (3), 669–683 (2015).Google Scholar

3. Akgun, B., Cakmak, M., Jiang, K. and Thomaz, A. L., “Keyframe-based learning from demonstration,” Int. J. Soc. Robot. 4 (4), 343–355 (2012).Google Scholar

4. Al-Shuka, H. F. N., Corves, B. and Zhu, W.-H., “Function approximation technique-based adaptive virtual decomposition control for a serial-chain manipulator,” Robotica 32 (3), 375–399 (2014).Google Scholar

5. Arif, M., Ishihara, T. and Inooka, H., “Incorporation of experience in iterative learning controllers using locally weighted learning,” Automatica 37 (6), 881–888 (2001).CrossRef Google Scholar

6. Atkeson, C. G., Moore, A. W. and Schaal, S., “Locally weighted learning,” Artif. Intell. 11, 11–73 (1997).CrossRef Google Scholar

7. Berenson, D., Abbeel, P. and Goldberg, K., “A Robot Path Planning Framework that Learns from Experience,” Proceedings of the International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA (2012) pp. 3671–3678.Google Scholar

8. Bocsi, B., Csato, L. and Peters, J., “Alignment-Based Transfer Learning for Robot Models,” Proceedings of the International Joint Conference on Neural Networks, Dallas, TX (2013) pp. 1–7.Google Scholar

9. Bowen, C., Ye, G. and Alterovitz, R., “Asymptotically optimal motion planning for learned tasks using time-dependent cost maps,” IEEE Trans. Autom. Sci. Eng. 12 (1), 171–182 (2015).Google Scholar

10. Brandi, S., Kroemer, O. and Peters, J., “Generalizing Pouring Actions Between Objects using Warped Parameters,” Proceedings of the 14th IEEE-RAS International Conference on Humanoid Robots, Madrid, Spain (2014) pp. 616–621.Google Scholar

11. Branicky, M. S., Knepper, R. A. and Kuffner, J. J., “Path and Trajectory Diversity: Theory and Algorithms,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Pasadena, CA, USA (2008) pp. 1359–1364.Google Scholar

12. Broun, A., Beck, C., Pipe, T., Mirmehdi, M. and Melhuish, C., “Bootstrapping a robot's kinematic model,” Robot. Auton. Syst. 62 (3), 330–339 (2014).Google Scholar

13. Castro da Silva, B., Konidaris, G. and Barto, A. G., “Learning Parameterized Skills,” Proceedings of the 29^th International Conference on Machine Learning (ICML-12), Edinburgh, Scotland (2012) pp. 1679–1686.Google Scholar

14. Deisenroth, M. P. and Rasmussen, C. E., “PILCO: A Model-Based and Data-Efficient Approach to Policy Search,” Proceedings of the 28^th International Conference on Machine Learning, Bellevue, WA, USA (2011) pp. 465–472.Google Scholar

15. El-Fakdi, A. and Carreras, M., “Two-step gradient-based reinforcement learning for underwater robotics behavior learning,” Robotics and Autonomous Systems 61 (3), 271–282 (2013).Google Scholar

16. Esfandiar, H., Daneshmand, S. and Kermani, R. D., “On the control of a single flexible arm robot via Youla-Kucera parameterization,” Robotica 34 (01), 150–172 (2016).Google Scholar

17. Grollman, D. H. and Jenkins, O. C., “Sparse Incremental Learning for Interactive Robot Control Policy Estimation,” Proceedings of the IEEE International Conference on Robotics and Automation, Pasadena, CA, USA (2008) pp. 3315–3320.Google Scholar

18. Jamone, L., Damas, B. and Santos-Victor, J., “Incremental Learning of Context-Dependent Dynamic Internal Models for Robot Control,” Proceedings of the IEEE International Symposium on Intelligent Control (ISIC), Antibes, France (2014) pp. 1336–1341.Google Scholar

19. Kakade, S. M., Kearns, M. J. and Langford, J., “Exploration in Metric State Spaces,” Proceedings of the 20^th International Conference on Machine Learning (ICML), Washington, D.C., USA (2003) pp. 306–312.Google Scholar

20. Kim, B., Kim, A., Dai, H., Kaelbling, L. and Lozano-perez, T., “Generalizing over Uncertain Dynamics for Online Trajectory Generation,” Proceedings of the International Symposium on Robotics Research (ISRR), Sestri Levante, Italy (2015) pp. 1–16.Google Scholar

21. Kober, J., Wilhelm, A., Oztop, E. and Peters, J., “Reinforcement learning to adjust parametrized motor primitives to new situations,” Auton. Robots 33, 361–379 (2012).Google Scholar

22. Lehnert, C. and Wyeth, G., “Locally Weighted Learning Model Predictive Control for Nonlinear and Time Varying Dynamics,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Karlsruhe, Germany (2013) pp. 2619–2625.Google Scholar

23. Lovell, C., Jones, G., Zauner, K.-P. and Gunn, S. R., “Exploration and Exploitation with Insufficient Resources,” JMLR: Workshop and Conference Proceedings, Bellevue, WA, USA, vol. 26 (2012) pp. 37–61.Google Scholar

24. Luo, J. and Hauser, K., “Robust Trajectory Optimization Under Frictional Contact with Iterative Learning,” Lydia E. Kavraki, David Hsu, and Jonas Buchli, editors. Robotics: Science and Systems (RSS), Rome, Italy (2015) ISBN 978-0-9923747-1-6.Google Scholar

25. Mihalkova, L. and Mooney, R., “Using Active Relocation to Aid Reinforcement Learning,” Proceedings of the 19^th International FLAIRS Conference, Melbourne Beach, FL, USA (2006) pp. 580–585.Google Scholar

26. Mihai Moldovan, T., Levine, S., Jordan, M. I. and Abbeel, P., “Optimism-Driven Exploration for Nonlinear Systems,” Proceedings of the IEEE International Conference on Robotics and Automation (ICRA), Seattle, WA, USA (2015) pp. 3239–3246.Google Scholar

27. Mordatch, I. and Todorov, E., “Combining the Benefits of Function Approximation and Trajectory Optimization,” Dieter Fox, Lydia E. Kavraki and Hanna Kurniawati, editors. Robotics: Science and Systems (RSS), Berkeley, CA USA (2014) ISBN 978-0-9923747-0-9.Google Scholar

28. Nemec, B., Forte, D., Vuga, R., Tamosiunaite, M., Worgotter, F. and Ude, A., “Applying Statistical Generalization to Determine Search Direction for Reinforcement Learning of Movement Primitives,” IEEE-RAS International Conference on Humanoid Robots, Osaka, Japan (2012) pp. 65–70.Google Scholar

29. Nguyen-Tuong, D. and Peters, J., “Model learning for robot control: A survey,” Cognitive Science 12 (4), 319–40 (2011).Google Scholar

30. Pajak, G. and Pajak, I., “Sub-optimal trajectory planning for mobile manipulators,” Robotica 33 (06), 1181–1200 (2015).CrossRef Google Scholar

31. Park, C., Pan, J. and Manocha, D., “High-DOF robots in dynamic environments using incremental trajectory optimization,” Int. J. Humanoid Robot. 11 (02) (2014).CrossRef Google Scholar

32. Pastor, P., Hoffmann, H., Asfour, T. and Schaal, S., “Learning and Generalization of Motor Skills by Learning from Demonstration,” Proceedings of the IEEE International Conference on Robotics and Automation, ICRA '09, Kobe, Japan (May 2009) pp. 763–768.Google Scholar

33. Peters, J. and Schaal, S., “Reinforcement learning of motor skills with policy gradients,” Neural Netw. 21, 682–697 (2008).CrossRef Google Scholar PubMed

34. Petrič, T., Gams, A., Žlajpah, L. and Ude, A., “Online Learning of Task-Specific Dynamics for Periodic Tasks,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA (2014) pp. 1790–1795.Google Scholar

35. Posa, M. and Tedrake, R., “Direct Trajectory Optimization of Rigid Body Dynamical Systems Through Contact,” In: Algorithmic Foundations of Robotics X (Frazzoli, E., Lozano-Perez, T., Roy, N., Rus, D., eds.), volume 86 (Springer Berlin Heidelberg, 2013) pp. 527–542.Google Scholar

36. Rasmussen, C. E. and Williams, C. K. I., Gaussian Processes for Machine Learning (MIT Press, Boston, Massachusetts, United States, 2006).Google Scholar

37. Rosales, C., Ajoudani, A., Gabiccini, M. and Bicchi, A., “Active Gathering of Frictional Properties from Objects,” Proceedings of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), Chicago, IL, USA (Sep. 2014) pp. 3982–3987.Google Scholar

38. Rozo, L., Jimenez, P. and Torras, C., “Force-Based Robot Learning of Pouring Skills using Parametric Hidden Markov Models,” International Workshop on Robot Motion and Control, RoMoCo, Wasowo, Poland (Jul. 2013) pp. 227–232.Google Scholar

39. Schulman, J., Levine, S., Jordan, M. and Abbeel, P., “Trust Region Policy Optimization,” Proceedings of the International Conference on Machine Learning (ICML), Lille, France (2015) pp. 1889–1897.Google Scholar

40. Srinivas, N., Krause, A., Kakade, S. M. and Seeger, M., “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design,” Proceedings of the 27th International Conference on Machine Learning (ICML 2010), Haifa, Israel (2010) pp. 1015–1022.Google Scholar

41. Tamosiunaite, M., Nemec, B., Ude, A. and Wörgötter, F., “Learning to pour with a robot arm combining goal and shape learning for dynamic movement primitives,” Robot. Auton. Syst. 59 (11), 910–922 (2011).Google Scholar

42. Theodorou, E., Buchli, J. and Schaal, S., “Learning Policy Improvements with Path Integrals,” International Conference on Artificial Intelligence and Statistics (AISTATS), Sardinia, Italy (2010).Google Scholar

43. Zhang, Y., Luo, J. and Hauser, K., “Sampling-Based Motion Planning with Dynamic Intermediate State Objectives: Application to Throwing,” IEEE International Conference on Robotics and Automation (ICRA), St. Paul, MN, USA (2012) pp. 2551–2556.Google Scholar

Article contents

Selection of trajectory parameters for dynamic pouring tasks based on exploitation-driven updates of local metamodels

Summary

Keywords

Access options

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests