Hostname: page-component-848d4c4894-nmvwc Total loading time: 0 Render date: 2024-06-21T05:00:20.318Z Has data issue: false hasContentIssue false

Learn to flap: foil non-parametric path planning via deep reinforcement learning

Published online by Cambridge University Press:  27 March 2024

Z.P. Wang
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China School of Mechanical and Material Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
R.J. Lin
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
Z.Y. Zhao
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
X. Chen
Taihu Laboratory of Deepsea Technological Science, Wuxi, Jiangsu 214000, PR China
P.M. Guo*
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China School of Mechanical and Material Engineering, Queen's University, Kingston, ON K7L 3N6, Canada
N. Yang*
University of Chinese Academy of Sciences, Beijing 100049, PR China Institute of Automation, Chinese Academy of Sciences, Beijing 100190, PR China
Z.C. Wang
Laboratory of Ocean Energy Utilization of Ministry of Education, Dalian University of Technology, Dalian 116024, PR China School of Energy and Power Engineering, Dalian University of Technology, Dalian 116024, PR China
D.X. Fan*
School of Engineering, Westlake University, Hangzhou, Zhejiang 310024, PR China


To optimize flapping foil performance, in the current study we apply deep reinforcement learning (DRL) to plan foil non-parametric motion, as the traditional control techniques and simplified motions cannot fully model nonlinear, unsteady and high-dimensional foil–vortex interactions. Therefore, a DRL training framework is proposed based on the proximal policy optimization algorithm and the transformer architecture, where the policy is initialized from the sinusoidal expert display. We first demonstrate the effectiveness of the proposed DRL-training framework, learning the coherent foil flapping motion to generate thrust. Furthermore, by adjusting reward functions and action thresholds, DRL-optimized foil trajectories can gain significant enhancement in both thrust and efficiency compared with the sinusoidal motion. Last, through visualization of wake morphology and instantaneous pressure distributions, it is found that DRL-optimized foil can adaptively adjust the phases between motion and shedding vortices to improve hydrodynamic performance. Our results give a hint of how to solve complex fluid manipulation problems using the DRL method.

JFM Papers
© The Author(s), 2024. Published by Cambridge University Press

Access options

Get access to the full version of this content by using one of the access options below. (Log in options will check for institutional or personal access. Content may require purchase if you do not have access.)


These authors contributed equally to this work.


Ashraf, I., Wassenbergh, S.V. & Verma, S. 2021 Burst-and-coast swimming is not always energetically beneficial in fish (Hemigrammus bleheri). Bioinspir. Biomim. 16 (1), 016002.CrossRefGoogle Scholar
Barrett, D.S., Triantafyllou, M.S., Yue, D.K.P., Grosenbaugh, M.A. & Wolfgang, M.J. 1999 Drag reduction in fish-like locomotion. J. Fluid Mech. 392, 183212.CrossRefGoogle Scholar
Beal, D.N., Hover, F.S., Triantafyllou, M.S., Liao, J.C. & Lauder, G.V. 2006 Passive propulsion in vortex wakes. J. Fluid Mech. 549, 385402.CrossRefGoogle Scholar
Beattie, C., et al. 2016 Deepmind lab. arXiv:1612.03801Google Scholar
Berner, C., et al. 2019 Dota 2 with large scale deep reinforcement learning, p. 1. arXiv:1912.06680Google Scholar
Brown, T., et al. 2020 Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 33, 18771901.Google Scholar
Buchholz, J.H.J. & Smits, A.J. 2008 The wake structure and thrust performance of a rigid low-aspect-ratio pitching panel. J. Fluid Mech. 603, 331365.CrossRefGoogle ScholarPubMed
Cassandra, A.R. 1998 A survey of POMDP applications. In Working Notes of AAAI 1998 Fall Symposium on Planning with Partially Observable Markov Decision Processes, vol. 1724.Google Scholar
Chin, D.D. & Lentink, D. 2016 Flapping wing aerodynamics: from insects to vertebrates. J. Expl Biol. 219 (7), 920932.CrossRefGoogle ScholarPubMed
Degrave, J., et al. 2022 Magnetic control of tokamak plasmas through deep reinforcement learning. Nature 602 (7897), 414419.CrossRefGoogle ScholarPubMed
Domenici, P. & Blake, R.W. 1997 The kinematics and performance of fish fast-start swimming. J. Expl Biol. 200 (8), 11651178.CrossRefGoogle ScholarPubMed
Dong, H., Mittal, R. & Najjar, F.M. 2006 Wake topology and hydrodynamic performance of low-aspect-ratio flapping foils. J. Fluid Mech. 566, 309343.CrossRefGoogle Scholar
Dusek, J., Kottapalli, A.G.P., Woo, M.E., Asadnia, M., Miao, J., Lang, J.H. & Triantafyllou, M.S. 2012 Development and testing of bio-inspired microelectromechanical pressure sensor arrays for increased situational awareness for marine vehicles. Smart Mater. Struct. 22 (1), 014002.CrossRefGoogle Scholar
Esfahani, M.A., Karbasian, H.R. & Kim, K.C. 2019 Multi-objective optimization of the kinematic parameters of fish-like swimming using a genetic algorithm method. J. Hydrodyn. 31, 333344.CrossRefGoogle Scholar
Esslinger, K., Platt, R. & Amato, C. 2022 Deep transformer q-networks for partially observable reinforcement learning. arXiv:2206.01078Google Scholar
Fan, D., Yang, L., Wang, Z., Triantafyllou, M.S. & Karniadakis, G.E. 2020 Reinforcement learning for bluff body active flow control in experiments and simulations. Proc. Natl Acad. Sci. 117 (42), 2609126098.CrossRefGoogle ScholarPubMed
Fish, F.E. 1993 Power output and propulsive efficiency of swimming bottlenose dolphins (Tursiops truncatus). J. Expl Biol. 185 (1), 179193.CrossRefGoogle Scholar
Flinois, T.L.B. & Morgans, A.S. 2016 Feedback control of unstable flows: a direct modelling approach using the eigensystem realisation algorithm. J. Fluid Mech. 793, 4178.CrossRefGoogle Scholar
Floryan, D., Van Buren, T., Rowley, C.W. & Smits, A.J. 2017 Scaling the propulsive performance of heaving and pitching foils. J. Fluid Mech. 822, 386397.CrossRefGoogle Scholar
Gazzola, M., Argentina, M., Mahadevan, L. 2014 Scaling macroscopic aquatic locomotion. Nat. Phys. 10 (10), 758761.CrossRefGoogle Scholar
Gerhard, J., Pastoor, M., King, R., Noack, B., Dillmann, A., Morzynski, M. & Tadmor, G. 2003 Model-based control of vortex shedding using low-dimensional Galerkin models. In 33rd AIAA Fluid Dynamics Conference and Exhibit, p. 4262.Google Scholar
Gillioz, A., Casas, J., Mugellini, E. & Abou Khaled, O. 2020 Overview of the transformer-based models for NLP tasks. In 2020 15th Conference on Computer Science and Information Systems (FedCSIS), pp. 179–183. IEEE.CrossRefGoogle Scholar
Godoy-Diana, R., Aider, J.-L. & Wesfreid, J.E. 2008 Transitions in the wake of a flapping foil. Phys. Rev. E 77 (1), 016308.CrossRefGoogle ScholarPubMed
Guéniat, F., Mathelin, L. & Hussaini, M.Y. 2016 A statistical learning strategy for closed-loop control of fluid flows. Theor. Comput. Fluid Dyn. 30, 497510.CrossRefGoogle Scholar
Hover, F.S. & Triantafyllou, M.S. 2003 Forces on oscillating foils for propulsion and maneuvering. J. Fluids Struct. 17 (1), 163183.Google Scholar
Izraelevitz, J.S. & Triantafyllou, M.S. 2014 Adding in-line motion and model-based optimization offers exceptional force control authority in flapping foils. J. Fluid Mech. 742, 534.CrossRefGoogle Scholar
Jayne, B.C. & Lauder, G.V. 1995 Speed effects on midline kinematics during steady undulatory swimming of largemouth bass, Micropterus salmoides. J. Expl Biol. 198 (2), 585602.CrossRefGoogle Scholar
Khan, S., Naseer, M., Hayat, M., Zamir, S.W., Khan, F.S. & Shah, M. 2022 Transformers in vision: a survey. ACM Comput. Surv. (CSUR) 54 (10s), 141.CrossRefGoogle Scholar
Lagopoulos, N.S., Weymouth, G.D. & Ganapathisubramani, B. 2019 Universal scaling law for drag-to-thrust wake transition in flapping foils. J. Fluid Mech. 872, R1.CrossRefGoogle Scholar
Lagopoulos, N.S., Weymouth, G.D. & Ganapathisubramani, B. 2020 Deflected wake interaction of tandem flapping foils. J. Fluid Mech. 903, A9.CrossRefGoogle Scholar
Li, G., Ashraf, I., François, B., Kolomenskiy, D., Lechenault, F., Godoy-Diana, R., Thiria, B. 2021 Burst-and-coast swimmers optimize gait by adapting unique intrinsic cycle. Commun. Biol. 4 (1), 40.CrossRefGoogle ScholarPubMed
Licht, S., Polidoro, V., Flores, M., Hover, F.S. & Triantafyllou, M.S. 2004 Design and projected performance of a flapping foil AUV. IEEE J. Ocean. Engng 29 (3), 786794.CrossRefGoogle Scholar
Lighthill, J. 1969 Hydromechanics of aquatic animal propulsion. Annu. Rev. Fluid Mech. 1 (1), 413446.CrossRefGoogle Scholar
Lighthill, M.J. 1971 Large-amplitude elongated-body theory of fish locomotion. Proc. R. Soc. Lond. B Biol. Sci. 179 (1055), 125138.Google Scholar
Liu, K., Huang, H.B. & Lu, X.-Y. 2020 Hydrodynamic benefits of intermittent locomotion of a self-propelled flapping plate. Phys. Rev. E 102, 053106.CrossRefGoogle ScholarPubMed
Liu, Z., Bhattacharjee, K.S., Tian, F.-B., Young, J., Ray, T. & Lai, J.C.S. 2019 Kinematic optimization of a flapping foil power generator using a multi-fidelity evolutionary algorithm. Renew. Energy 132, 543557.CrossRefGoogle Scholar
Low, K.H. 2011 Current and future trends of biologically inspired underwater vehicles. In 2011 Defense Science Research Conference and Expo (DSR), pp. 1–8. IEEE.CrossRefGoogle Scholar
Lucas, K.N., Lauder, G.V. & Tytell, E.D. 2020 Airfoil-like mechanics generate thrust on the anterior body of swimming fishes. Proc. Natl Acad. Sci. 117 (19), 1058510592.CrossRefGoogle ScholarPubMed
Luo, B., Liu, D. & Wu, H.-N. 2017 Adaptive constrained optimal control design for data-based nonlinear discrete-time systems with critic-only structure. IEEE Trans. Neural Netw. Learn. Syst. 29 (6), 20992111.CrossRefGoogle ScholarPubMed
Maertens, A.P. & Weymouth, G.D. 2015 Accurate cartesian-grid simulations of near-body flows at intermediate Reynolds numbers. Comput. Meth. Appl. Mech. Engng 283, 106129.CrossRefGoogle Scholar
Marler, R.T. & Arora, J.S. 2004 Survey of multi-objective optimization methods for engineering. Struct. Multidiscipl. Optim. 26, 369395.CrossRefGoogle Scholar
Marler, R.T. & Arora, J.S. 2010 The weighted sum method for multi-objective optimization: new insights. Struct. Multidiscipl. Optim. 41, 853862.CrossRefGoogle Scholar
Medsker, L.R. & Jain, L.C. 2001 Recurrent neural networks. Design Appl. 5 (64–67), 2.Google Scholar
Mock, J.W. & Muknahallipatna, S.S. 2023 A comparison of PPO, TD3 and SAC reinforcement algorithms for quadruped walking gait generation. J. Intell. Learn. Syst. Appl. 15 (1), 3656.Google Scholar
Muhammad, Z., Alam, M.M. & Noack, B.R. 2022 Efficient thrust enhancement by modified pitching motion. J. Fluid Mech. 933, A13.CrossRefGoogle Scholar
Müller, U.K., Van Den Heuvel, B.L.E., Stamhuis, E.J. & Videler, J.J. 1997 Fish foot prints: morphology and energetics of the wake behind a continuously swimming mullet (Chelon labrosus risso). J. Expl Biol. 200 (22), 28932906.CrossRefGoogle Scholar
Newman, J.N. 1977 Marine Hydrodynamics. MIT Press.CrossRefGoogle Scholar
Ni, T., Eysenbach, B. & Salakhutdinov, R. 2022 Recurrent model-free RL can be a strong baseline for many POMDPs. In International Conference on Machine Learning, ICML 2022, 17–23 July 2022, Baltimore, Maryland, USA (ed. K. Chaudhuri, S. Jegelka, L. Song, C. Szepesvári, G. Niu & S. Sabato), Proceedings of Machine Learning Research, vol. 162, pp. 16691–16723. PMLR.Google Scholar
Peng, X.B., Berseth, G. & Van de Panne, M. 2016 Terrain-adaptive locomotion skills using deep reinforcement learning. ACM Trans. Graph. 35 (4), 112.Google Scholar
Preparata, F.P. & Shamos, M.I. 2012 Computational Geometry: An Introduction. Springer Science & Business Media.Google Scholar
Qi, J., et al. 2022 Recent progress in active mechanical metamaterials and construction principles. Adv. Sci. 9 (1), 2102662.CrossRefGoogle ScholarPubMed
Rabault, J., Kuchta, M., Jensen, A., Réglade, U. & Cerardi, N. 2019 Artificial neural networks trained through deep reinforcement learning discover control strategies for active flow control. J. Fluid Mech. 865, 281302.CrossRefGoogle Scholar
Raffin, A., Hill, A., Gleave, A., Kanervisto, A., Ernestus, M. & Dormann, N. 2021 Stable-baselines3: reliable reinforcement learning implementations. J. Machine Learning Res. 22 (1), 1234812355.Google Scholar
Schlanderer, S.C., Weymouth, G.D. & Sandberg, R.D. 2017 The boundary data immersion method for compressible flows with application to aeroacoustics. J. Comput. Phys. 333, 440461.CrossRefGoogle Scholar
Schnipper, T., Andersen, A. & Bohr, T. 2009 Vortex wakes of a flapping foil. J. Fluid Mech. 633, 411423.CrossRefGoogle Scholar
Schouveiler, L., Hover, F.S. & Triantafyllou, M.S. 2005 Performance of flapping foil propulsion. J. Fluids Struct. 20 (7), 949959.CrossRefGoogle Scholar
Schulman, J., Wolski, F., Dhariwal, P., Radford, A. & Klimov, O. 2017 Proximal policy optimization algorithms. arXiv:1707.06347Google Scholar
Silver, D., et al. 2017 Mastering the game of go without human knowledge. Nature 550 (7676), 354359.CrossRefGoogle ScholarPubMed
Streitlien, K. & Barrett, D.S. 1998 Oscillating foils of high propulsive efficiency. J. Fluid Mech. 360, 4172.Google Scholar
Sutton, R.S. & Barto, A.G. 2018 Reinforcement Learning: An Introduction. MIT Press.Google Scholar
Tan, T., Bao, F., Deng, Y., Jin, A., Dai, Q. & Wang, J. 2019 Cooperative deep reinforcement learning for large-scale traffic grid signal control. IEEE Trans. Cybern. 50 (6), 26872700.CrossRefGoogle ScholarPubMed
Teng, L., Deng, J., Pan, D. & Shao, X. 2016 Effects of non-sinusoidal pitching motion on energy extraction performance of a semi-active flapping foil. Renew. Energy 85, 810818.CrossRefGoogle Scholar
Triantafyllou, M.S., Triantafyllou, G.S. & Yue, D.K.P. 2000 Hydrodynamics of fishlike swimming. Annu. Rev. Fluid Mech. 32 (1), 3353.CrossRefGoogle Scholar
Triantafyllou, M.S., Weymouth, G.D. & Miao, J. 2016 Biomimetic survival hydrodynamics and flow sensing. Annu. Rev. Fluid Mech. 48, 124.CrossRefGoogle Scholar
Van Buren, T., Floryan, D., Wei, N. & Smits, A.J. 2018 Flow speed has little impact on propulsive characteristics of oscillating foils. Phys. Rev. Fluids 3 (1), 013103.CrossRefGoogle Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł. & Polosukhin, I. 2017 Attention is all you need. Adv. Neural Inf. Process. Syst. 30.Google Scholar
Verma, S., Novati, G. & Koumoutsakos, P. 2018 Efficient collective swimming by harnessing vortices through deep reinforcement learning. Proc. Natl Acad. Sci. 115 (23), 58495854.CrossRefGoogle ScholarPubMed
Videler, J.J. 1981 Swimming movements, body structure and propulsion in Cod Gadus morhua. In Symposia of the Zoological Society of London, vol. 48.Google Scholar
Wan, Z., Jiang, C., Fahad, M., Ni, Z., Guo, Y. & He, H. 2018 Robot-assisted pedestrian regulation based on deep reinforcement learning. IEEE Trans. Cybern. 50 (4), 16691682.CrossRefGoogle ScholarPubMed
Wang, Y.-Z., Mei, Y.-F., Aubry, N., Chen, Z., Wu, P. & Wu, W.-T. 2022 Deep reinforcement learning based synthetic jet control on disturbed flow over airfoil. Phys. Fluids 34 (3), 033606.CrossRefGoogle Scholar
Weymouth, G.D. & Yue, D.K.P. 2011 Boundary data immersion method for cartesian-grid simulations of fluid-body interaction problems. J. Comput. Phys. 230 (16), 62336247.CrossRefGoogle Scholar
Won, D.-O., Müller, K.-R. & Lee, S.-W. 2020 An adaptive deep reinforcement learning framework enables curling robots with human-like performance in real-world conditions. Science Robotics 5 (46), eabb9764.CrossRefGoogle ScholarPubMed
Wu, X., Zhang, X., Tian, X., Li, X. & Lu, W. 2020 A review on fluid dynamics of flapping foils. Ocean Engng 195, 106712.CrossRefGoogle Scholar
Xiao, Q. & Zhu, Q. 2014 A review on flow energy harvesters based on flapping foils. J. Fluids Struct. 46, 174191.CrossRefGoogle Scholar
Young, J., Lai, J.C.S. & Platzer, M.F. 2014 A review of progress and challenges in flapping foil power generation. Prog. Aerosp. Sci. 67, 228.CrossRefGoogle Scholar
Yu, C., Velu, A., Vinitsky, E., Gao, J., Wang, Y., Bayen, A. & Wu, Y. 2022 The surprising effectiveness of PPO in cooperative multi-agent games. Adv. Neural Inf. Process. Syst. 35, 2461124624.Google Scholar
Zhang, H., Chen, H., Xiao, C., Li, B., Liu, M., Boning, D. & Hsieh, C.-J. 2020 Robust deep reinforcement learning against adversarial perturbations on state observations. Adv. Neural Inf. Process. Syst. 33, 2102421037.Google Scholar
Zhang, T., Tian, R., Yang, H., Wang, C., Sun, J., Zhang, S. & Xie, G. 2022 From simulation to reality: a learning framework for fish-like robots to perform control tasks. IEEE Trans. Robot. 38 (6), 38613878.CrossRefGoogle Scholar
Zheng, J., Zhang, T., Wang, C., Xiong, M. & Xie, G. 2021 Learning for attitude holding of a robotic fish: an end-to-end approach with sim-to-real transfer. IEEE Trans. Robot. 38 (2), 12871303.CrossRefGoogle Scholar