Learn to flap: foil non-parametric path planning via deep reinforcement learning

Z.P. Wang; R.J. Lin; Z.Y. Zhao; X. Chen; P.M. Guo; N. Yang; Z.C. Wang; D.X. Fan

doi:10.1017/jfm.2023.1096

Learn to flap: foil non-parametric path planning via deep reinforcement learning

Published online by Cambridge University Press: 27 March 2024

Z.P. Wang

R.J. Lin ,

Z.Y. Zhao ,

X. Chen ,

P.M. Guo

N. Yang ,

Z.C. Wang and

D.X. Fan

Show author details

Z.P. Wang: Affiliation:
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China School of Mechanical and Material Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
R.J. Lin: Affiliation:
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
Z.Y. Zhao: Affiliation:
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
X. Chen: Affiliation:
Taihu Laboratory of Deepsea Technological Science, Wuxi, Jiangsu 214000, PR China
P.M. Guo*: Affiliation:
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China School of Mechanical and Material Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
N. Yang*: Affiliation:
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
Z.C. Wang: Affiliation:
Laboratory of Ocean Energy Utilization of Ministry of Education, Dalian University of Technology, Dalian 116024, PR China School of Energy and Power Engineering, Dalian University of Technology, Dalian 116024, PR China
D.X. Fan*: Affiliation:
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China
*: †Email addresses for correspondence: guopengming@westlake.edu.cn, ning.yang@ia.ac.cn, fandixia@westlake.edu.cn
†Email addresses for correspondence: guopengming@westlake.edu.cn, ning.yang@ia.ac.cn, fandixia@westlake.edu.cn
†Email addresses for correspondence: guopengming@westlake.edu.cn, ning.yang@ia.ac.cn, fandixia@westlake.edu.cn

Article contents

Abstract
Footnotes
References

Get access

Rights & Permissions

Abstract

To optimize flapping foil performance, in the current study we apply deep reinforcement learning (DRL) to plan foil non-parametric motion, as the traditional control techniques and simplified motions cannot fully model nonlinear, unsteady and high-dimensional foil–vortex interactions. Therefore, a DRL training framework is proposed based on the proximal policy optimization algorithm and the transformer architecture, where the policy is initialized from the sinusoidal expert display. We first demonstrate the effectiveness of the proposed DRL-training framework, learning the coherent foil flapping motion to generate thrust. Furthermore, by adjusting reward functions and action thresholds, DRL-optimized foil trajectories can gain significant enhancement in both thrust and efficiency compared with the sinusoidal motion. Last, through visualization of wake morphology and instantaneous pressure distributions, it is found that DRL-optimized foil can adaptively adjust the phases between motion and shedding vortices to improve hydrodynamic performance. Our results give a hint of how to solve complex fluid manipulation problems using the DRL method.

JFM classification

Vortex Flows: Vortex interactions Flow Control: Control theory

Type: JFM Papers
Information: Journal of Fluid Mechanics , Volume 984 , 10 April 2024 , A9

DOI: https://doi.org/10.1017/jfm.2023.1096 [Opens in a new window]
Copyright: © The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)

Footnotes

‡

These authors contributed equally to this work.

References

Ashraf, I., Wassenbergh, S.V. & Verma, S. 2021 Burst-and-coast swimming is not always energetically beneficial in fish (Hemigrammus bleheri). Bioinspir. Biomim. 16 (1), 016002.CrossRef Google Scholar

Barrett, D.S., Triantafyllou, M.S., Yue, D.K.P., Grosenbaugh, M.A. & Wolfgang, M.J. 1999 Drag reduction in fish-like locomotion. J. Fluid Mech. 392, 183–212.CrossRef Google Scholar

Beal, D.N., Hover, F.S., Triantafyllou, M.S., Liao, J.C. & Lauder, G.V. 2006 Passive propulsion in vortex wakes. J. Fluid Mech. 549, 385–402.CrossRef Google Scholar

Beattie, C., et al. 2016 Deepmind lab. arXiv:1612.03801 Google Scholar

Berner, C., et al. 2019 Dota 2 with large scale deep reinforcement learning, p. 1. arXiv:1912.06680 Google Scholar

Brown, T., et al. 2020 Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 1877–1901.Google Scholar

Buchholz, J.H.J. & Smits, A.J. 2008 The wake structure and thrust performance of a rigid low-aspect-ratio pitching panel. J. Fluid Mech. 603, 331–365.CrossRef Google Scholar PubMed

Cassandra, A.R. 1998 A survey of POMDP applications. In Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, vol. 1724.Google Scholar

Chin, D.D. & Lentink, D. 2016 Flapping wing aerodynamics: from insects to vertebrates. J. Expl Biol. 219 (7), 920–932.CrossRef Google Scholar PubMed

Degrave, J., et al. 2022 Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602 (7897), 414–419.CrossRef Google Scholar PubMed

Domenici, P. & Blake, R.W. 1997 The kinematics and performance of fish fast-start swimming. J. Expl Biol. 200 (8), 1165–1178.CrossRef Google Scholar PubMed

Dong, H., Mittal, R. & Najjar, F.M. 2006 Wake topology and hydrodynamic performance of low-aspect-ratio flapping foils. J. Fluid Mech. 566, 309–343.CrossRef Google Scholar

Dusek, J., Kottapalli, A.G.P., Woo, M.E., Asadnia, M., Miao, J., Lang, J.H. & Triantafyllou, M.S. 2012 Development and testing of bio-inspired microelectromechanical pressure sensor arrays for increased situational awareness for marine vehicles. Smart Mater. Struct. 22 (1), 014002.CrossRef Google Scholar

Esfahani, M.A., Karbasian, H.R. & Kim, K.C. 2019 Multi-objective optimization of the kinematic parameters of fish-like swimming using a genetic algorithm method. J. Hydrodyn. 31, 333–344.CrossRef Google Scholar

Esslinger, K., Platt, R. & Amato, C. 2022 Deep transformer q-networks for partially observable reinforcement learning. arXiv:2206.01078 Google Scholar

Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S. & Karniadakis, G.E. 2020 Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. 117 (42), 26091–26098.CrossRef Google Scholar PubMed

Fish, F.E. 1993 Power output and propulsive efficiency of swimming bottlenose dolphins (Tursiops truncatus). J. Expl Biol. 185 (1), 179–193.CrossRef Google Scholar

Flinois, T.L.B. & Morgans, A.S. 2016 Feedback control of unstable flows: a direct modelling approach using the eigensystem realisation algorithm. J. Fluid Mech. 793, 41–78.CrossRef Google Scholar

Floryan, D., Van Buren, T., Rowley, C.W. & Smits, A.J. 2017 Scaling the propulsive performance of heaving and pitching foils. J. Fluid Mech. 822, 386–397.CrossRef Google Scholar

Gazzola, M., Argentina, M., Mahadevan, L. 2014 Scaling macroscopic aquatic locomotion. Nat. Phys. 10 (10), 758–761.CrossRef Google Scholar

Gerhard, J., Pastoor, M., King, R., Noack, B., Dillmann, A., Morzynski, M. & Tadmor, G. 2003 Model-based control of vortex shedding using low-dimensional Galerkin models. In 33rd AIAA Fluid Dynamics Conference and Exhibit, p. 4262.Google Scholar

Gillioz, A., Casas, J., Mugellini, E. & Abou Khaled, O. 2020 Overview of the transformer-based models for NLP tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 179–183. IEEE.CrossRef Google Scholar

Godoy-Diana, R., Aider, J.-L. & Wesfreid, J.E. 2008 Transitions in the wake of a flapping foil. Phys. Rev. E 77 (1), 016308.CrossRef Google Scholar PubMed

Guéniat, F., Mathelin, L. & Hussaini, M.Y. 2016 A statistical learning strategy for closed-loop control of fluid flows. Theor. Comput. Fluid Dyn. 30, 497–510.CrossRef Google Scholar

Hover, F.S. & Triantafyllou, M.S. 2003 Forces on oscillating foils for propulsion and maneuvering. J. Fluids Struct. 17 (1), 163–183.Google Scholar

Izraelevitz, J.S. & Triantafyllou, M.S. 2014 Adding in-line motion and model-based optimization offers exceptional force control authority in flapping foils. J. Fluid Mech. 742, 5–34.CrossRef Google Scholar

Jayne, B.C. & Lauder, G.V. 1995 Speed effects on midline kinematics during steady undulatory swimming of largemouth bass, Micropterus salmoides. J. Expl Biol. 198 (2), 585–602.CrossRef Google Scholar

Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S. & Shah, M. 2022 Transformers in vision: a survey. ACM Comput. Surv. (CSUR) 54 (10s), 1–41.CrossRef Google Scholar

Lagopoulos, N.S., Weymouth, G.D. & Ganapathisubramani, B. 2019 Universal scaling law for drag-to-thrust wake transition in flapping foils. J. Fluid Mech. 872, R1.CrossRef Google Scholar

Lagopoulos, N.S., Weymouth, G.D. & Ganapathisubramani, B. 2020 Deflected wake interaction of tandem flapping foils. J. Fluid Mech. 903, A9.CrossRef Google Scholar

Li, G., Ashraf, I., François, B., Kolomenskiy, D., Lechenault, F., Godoy-Diana, R., Thiria, B. 2021 Burst-and-coast swimmers optimize gait by adapting unique intrinsic cycle. Commun. Biol. 4 (1), 40.CrossRef Google Scholar PubMed

Licht, S., Polidoro, V., Flores, M., Hover, F.S. & Triantafyllou, M.S. 2004 Design and projected performance of a flapping foil AUV. IEEE J. Ocean. Engng 29 (3), 786–794.CrossRef Google Scholar

Lighthill, J. 1969 Hydromechanics of aquatic animal propulsion. Annu. Rev. Fluid Mech. 1 (1), 413–446.CrossRef Google Scholar

Lighthill, M.J. 1971 Large-amplitude elongated-body theory of fish locomotion. Proc. R. Soc. Lond. B Biol. Sci. 179 (1055), 125–138.Google Scholar

Liu, K., Huang, H.B. & Lu, X.-Y. 2020 Hydrodynamic benefits of intermittent locomotion of a self-propelled flapping plate. Phys. Rev. E 102, 053106.CrossRef Google Scholar PubMed

Liu, Z., Bhattacharjee, K.S., Tian, F.-B., Young, J., Ray, T. & Lai, J.C.S. 2019 Kinematic optimization of a flapping foil power generator using a multi-fidelity evolutionary algorithm. Renew. Energy 132, 543–557.CrossRef Google Scholar

Low, K.H. 2011 Current and future trends of biologically inspired underwater vehicles. In 2011 Defense Science Research Conference and Expo (DSR), pp. 1–8. IEEE.CrossRef Google Scholar

Lucas, K.N., Lauder, G.V. & Tytell, E.D. 2020 Airfoil-like mechanics generate thrust on the anterior body of swimming fishes. Proc. Natl Acad. Sci. 117 (19), 10585–10592.CrossRef Google Scholar PubMed

Luo, B., Liu, D. & Wu, H.-N. 2017 Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure. IEEE Trans. Neural Netw. Learn. Syst. 29 (6), 2099–2111.CrossRef Google Scholar PubMed

Maertens, A.P. & Weymouth, G.D. 2015 Accurate cartesian-grid simulations of near-body flows at intermediate Reynolds numbers. Comput. Meth. Appl. Mech. Engng 283, 106–129.CrossRef Google Scholar

Marler, R.T. & Arora, J.S. 2004 Survey of multi-objective optimization methods for engineering. Struct. Multidiscipl. Optim. 26, 369–395.CrossRef Google Scholar

Marler, R.T. & Arora, J.S. 2010 The weighted sum method for multi-objective optimization: new insights. Struct. Multidiscipl. Optim. 41, 853–862.CrossRef Google Scholar

Medsker, L.R. & Jain, L.C. 2001 Recurrent neural networks. Design Appl. 5 (64–67), 2.Google Scholar

Mock, J.W. & Muknahallipatna, S.S. 2023 A comparison of PPO, TD3 and SAC reinforcement algorithms for quadruped walking gait generation. J. Intell. Learn. Syst. Appl. 15 (1), 36–56.Google Scholar

Muhammad, Z., Alam, M.M. & Noack, B.R. 2022 Efficient thrust enhancement by modified pitching motion. J. Fluid Mech. 933, A13.CrossRef Google Scholar

Müller, U.K., Van Den Heuvel, B.L.E., Stamhuis, E.J. & Videler, J.J. 1997 Fish foot prints: morphology and energetics of the wake behind a continuously swimming mullet (Chelon labrosus risso). J. Expl Biol. 200 (22), 2893–2906.CrossRef Google Scholar

Newman, J.N. 1977 Marine Hydrodynamics. MIT Press.CrossRef Google Scholar

Ni, T., Eysenbach, B. & Salakhutdinov, R. 2022 Recurrent model-free RL can be a strong baseline for many POMDPs. In International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA (ed. K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu & S. Sabato), Proceedings of Machine Learning Research, vol. 162, pp. 16691–16723. PMLR.Google Scholar

Peng, X.B., Berseth, G. & Van de Panne, M. 2016 Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph. 35 (4), 1–12.Google Scholar

Preparata, F.P. & Shamos, M.I. 2012 Computational Geometry: An Introduction. Springer Science & Business Media.Google Scholar

Qi, J., et al. 2022 Recent progress in active mechanical metamaterials and construction principles. Adv. Sci. 9 (1), 2102662.CrossRef Google Scholar PubMed

Rabault, J., Kuchta, M., Jensen, A., Réglade, U. & Cerardi, N. 2019 Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech. 865, 281–302.CrossRef Google Scholar

Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. & Dormann, N. 2021 Stable-baselines3: reliable reinforcement learning implementations. J. Machine Learning Res. 22 (1), 12348–12355.Google Scholar

Schlanderer, S.C., Weymouth, G.D. & Sandberg, R.D. 2017 The boundary data immersion method for compressible flows with application to aeroacoustics. J. Comput. Phys. 333, 440–461.CrossRef Google Scholar

Schnipper, T., Andersen, A. & Bohr, T. 2009 Vortex wakes of a flapping foil. J. Fluid Mech. 633, 411–423.CrossRef Google Scholar

Schouveiler, L., Hover, F.S. & Triantafyllou, M.S. 2005 Performance of flapping foil propulsion. J. Fluids Struct. 20 (7), 949–959.CrossRef Google Scholar

Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. 2017 Proximal policy optimization algorithms. arXiv:1707.06347 Google Scholar

Silver, D., et al. 2017 Mastering the game of go without human knowledge. Nature 550 (7676), 354–359.CrossRef Google Scholar PubMed

Streitlien, K. & Barrett, D.S. 1998 Oscillating foils of high propulsive efficiency. J. Fluid Mech. 360, 41–72.Google Scholar

Sutton, R.S. & Barto, A.G. 2018 Reinforcement Learning: An Introduction. MIT Press.Google Scholar

Tan, T., Bao, F., Deng, Y., Jin, A., Dai, Q. & Wang, J. 2019 Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans. Cybern. 50 (6), 2687–2700.CrossRef Google Scholar PubMed

Teng, L., Deng, J., Pan, D. & Shao, X. 2016 Effects of non-sinusoidal pitching motion on energy extraction performance of a semi-active flapping foil. Renew. Energy 85, 810–818.CrossRef Google Scholar

Triantafyllou, M.S., Triantafyllou, G.S. & Yue, D.K.P. 2000 Hydrodynamics of fishlike swimming. Annu. Rev. Fluid Mech. 32 (1), 33–53.CrossRef Google Scholar

Triantafyllou, M.S., Weymouth, G.D. & Miao, J. 2016 Biomimetic survival hydrodynamics and flow sensing. Annu. Rev. Fluid Mech. 48, 1–24.CrossRef Google Scholar

Van Buren, T., Floryan, D., Wei, N. & Smits, A.J. 2018 Flow speed has little impact on propulsive characteristics of oscillating foils. Phys. Rev. Fluids 3 (1), 013103.CrossRef Google Scholar

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. 2017 Attention is all you need. Adv. Neural Inf. Process. Syst. 30.Google Scholar

Verma, S., Novati, G. & Koumoutsakos, P. 2018 Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. 115 (23), 5849–5854.CrossRef Google Scholar PubMed

Videler, J.J. 1981 Swimming movements, body structure and propulsion in Cod Gadus morhua. In Symposia of the Zoological Society of London, vol. 48.Google Scholar

Wan, Z., Jiang, C., Fahad, M., Ni, Z., Guo, Y. & He, H. 2018 Robot-assisted pedestrian regulation based on deep reinforcement learning. IEEE Trans. Cybern. 50 (4), 1669–1682.CrossRef Google Scholar PubMed

Wang, Y.-Z., Mei, Y.-F., Aubry, N., Chen, Z., Wu, P. & Wu, W.-T. 2022 Deep reinforcement learning based synthetic jet control on disturbed flow over airfoil. Phys. Fluids 34 (3), 033606.CrossRef Google Scholar

Weymouth, G.D. & Yue, D.K.P. 2011 Boundary data immersion method for cartesian-grid simulations of fluid-body interaction problems. J. Comput. Phys. 230 (16), 6233–6247.CrossRef Google Scholar

Won, D.-O., Müller, K.-R. & Lee, S.-W. 2020 An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Science Robotics 5 (46), eabb9764.CrossRef Google Scholar PubMed

Wu, X., Zhang, X., Tian, X., Li, X. & Lu, W. 2020 A review on fluid dynamics of flapping foils. Ocean Engng 195, 106712.CrossRef Google Scholar

Xiao, Q. & Zhu, Q. 2014 A review on flow energy harvesters based on flapping foils. J. Fluids Struct. 46, 174–191.CrossRef Google Scholar

Young, J., Lai, J.C.S. & Platzer, M.F. 2014 A review of progress and challenges in flapping foil power generation. Prog. Aerosp. Sci. 67, 2–28.CrossRef Google Scholar

Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A. & Wu, Y. 2022 The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35, 24611–24624.Google Scholar

Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D. & Hsieh, C.-J. 2020 Robust deep reinforcement learning against adversarial perturbations on state observations. Adv. Neural Inf. Process. Syst. 33, 21024–21037.Google Scholar

Zhang, T., Tian, R., Yang, H., Wang, C., Sun, J., Zhang, S. & Xie, G. 2022 From simulation to reality: a learning framework for fish-like robots to perform control tasks. IEEE Trans. Robot. 38 (6), 3861–3878.CrossRef Google Scholar

Zheng, J., Zhang, T., Wang, C., Xiong, M. & Xie, G. 2021 Learning for attitude holding of a robotic fish: an end-to-end approach with sim-to-real transfer. IEEE Trans. Robot. 38 (2), 1287–1303.CrossRef Google Scholar

Article contents

Learn to flap: foil non-parametric path planning via deep reinforcement learning

Abstract

JFM classification

Access options

Footnotes

References

Save article to Kindle

Save article to Dropbox

Save article to Google Drive

Reply to: Submit a response

Your details

You have entered the maximum number of contributors

Conflicting interests