## 1. Introduction

Internal waves are often considered to be the primary pathway through which energy is transferred from large scales associated with wind and tidal forcing to small scales and turbulence in the ocean interior (MacKinnon *et al.* Reference MacKinnon2017). On vertical scales larger than $O({10}\ \textrm {m})$, the distribution of energy in internal waves is well described by the empirical spectrum of Garrett & Munk (Reference Garrett and Munk1972), and energy transfers occur through weakly nonlinear wave–wave interactions (Müller *et al.* Reference Müller, Holloway, Henyey and Pomphrey1986; Polzin & Lvov Reference Polzin and Lvov2011). (Exceptions to this paradigm are internal solitary waves, which can propagate over long distances without interacting with the ambient wave field as in Ramp *et al.* (Reference Ramp, Tang, Duda, Lynch, Liu, Chiu, Bahr, Kim and Yang2004).) At smaller scales, the flow becomes highly nonlinear and the form of the energy spectrum changes to the power law scaling ${E(m) \sim N^{2} m^{-3}}$, where $N^{2}:= (-g/\rho _0)\,\textrm {d}\bar {\rho }/\textrm {d}z$ is the squared buoyancy frequency (determined by gravitational acceleration $g$, mean density $\rho_0$, and the vertical density gradient ${\rm d}\bar{\rho}/{\rm d}z$) and $m$ is the vertical wavenumber (as observed e.g. by Gargett *et al.* Reference Gargett, Hendricks, Sanford, Osborn and Williams1981). Although the energy spectra are consistent across measurements (see also Gregg, Winkel & Sanford Reference Gregg, Winkel and Sanford1993), they sample flow fields that are highly intermittent, as highlighted for example by Baker & Gibson (Reference Baker and Gibson1987). Away from boundaries, such intermittency suggests that the turbulence may be sustained by a collection of localised, transient ‘wave breaking’ events that transfer energy downscale from the internal wave field.

Further evidence of turbulence arising from wave breaking processes can be found in the thermocline observations of Alford & Pinkel (Reference Alford and Pinkel2000), henceforth denoted $\text{{AP}}$. Intermittent metre-scale overturns, where the vertical profile of density becomes statically unstable, are used to indicate the presence of turbulence. In the observations, these overturns favourably sample regions with high ‘vertical strain’. Strain in this context refers to local changes in $(N^{2}(z))^{-1}$ due to vertical convergence or divergence of the flow, and regions with low local stratification relative to the mean are associated with high strain. Significant fluctuations in local stratification (and therefore strain) are suggestive of large-amplitude internal waves. There are, however, a range of possible mechanisms by which the waves can overturn and break, and it is unclear how different types of wave breaking may affect the mean rates of diapycnal mixing. Larger-scale vertical shear in the observations of $\text{{AP}}$ is often colocated with the internal wave field, and this shear is likely to play an important role in the breaking process.

For example, figure 11 of $\text{{AP}}$ highlights three ‘overturning events’ with seemingly different characteristics in terms of the roles of internal waves and shear. One of the overturns is associated with persistently low values of the gradient Richardson number $Ri_g=N^{2}/|\partial \boldsymbol {u}/\partial z|^{2}$ (where the velocity $\boldsymbol{u}$ is measured at 6.4 m resolution), suggesting that shear instabilities are primarily triggering the turbulence. Overturning events are also highlighted where large-amplitude internal waves strongly distort the density field. These ‘high strain’ overturns are observed where $Ri_g$ is reduced, but still large enough for instability of the large-scale shear to be unlikely. The vertical extent of the overturns in $\text{{AP}}$ is typically comparable to the scale of the strain features associated with internal gravity waves. This suggests that the overturns may be attributed to the breakdown of large-amplitude internal waves.

The stability of finite-amplitude internal gravity waves was first studied by Mied (Reference Mied1976) and Drazin (Reference Drazin1977) using linear stability analysis in a two-dimensional (2-D) plane. Klostermeyer (Reference Klostermeyer1991) later extended this work to consider 3-D perturbations. Finite-amplitude internal gravity waves were found to be generally unstable to linear perturbations, although the nature of the instability depended on the wave amplitude and propagation angle.

Lombard & Riley (Reference Lombard and Riley1996) and Sonmor & Klaassen (Reference Sonmor and Klaassen1997) expanded upon this work with more comprehensive linear stability studies. They found that as the propagation angle $\phi$ of the wave increases, the fastest growing perturbations become three-dimensional and resonant processes become less significant. This is important in the context of the above thermocline observations, where $\text{{AP}}$ estimated a propagation angle of $\phi \approx {85}^{\circ }$ for the waves associated with high strain. Although the condition of wave steepness $s>1$ is commonly used to determine whether a wave breaks through convective instability (e.g. Thorpe Reference Thorpe2018), the linear stability analysis suggested that there is no qualitative change in the breakdown of an internal wave across this threshold.

To our knowledge, relatively few studies have investigated the fully nonlinear breakdown of internal gravity waves through direct numerical simulation (DNS). Bouruet-Aubertot *et al.* (Reference Bouruet-Aubertot, Koudella, Staquet and Winters2001) performed 2- and 3-D DNS (with a grid size of $256^{3}$) of a plane wave propagating at $\phi =45^{\circ }$. Consistent with the earlier linear stability analysis, the primary instability of the wave occurred due to resonance. Fritts *et al.* (Reference Fritts, Wang, Werne, Lund and Wan2009*a*,Reference Fritts, Wang, Werne, Lund and Wan*b*) later used high resolution DNS (with a grid size of $2400\times 1600\times 800$) to consider the breakdown of a large-amplitude internal wave at $\phi =72^{\circ }$. They found that the breakdown was inherently three-dimensional, and that $s=1$ did not act as a significant threshold for the nature of the breakdown, consistent with the linear analysis discussed above.

Wave breaking processes can, however, be significantly impacted by the presence of a background shear flow. This was first highlighted by Bretherton (Reference Bretherton1966) and Booker & Bretherton (Reference Booker and Bretherton1967), who revealed the possible emergence of critical levels, where the horizontal phase speed of the waves matches the velocity of the shear flow. Vertical propagation of the waves is halted at these critical levels, causing the waves to break as their local amplitudes increase. This phenomenon was subsequently confirmed by the experiments of Koop & McGee (Reference Koop and McGee1986).

Winters & D'Asaro (Reference Winters and D'Asaro1994) performed 3-D hyper-diffusive simulations of internal wave packets approaching a critical level in a shear flow. These simulations were run on a very small grid of size $32^{2}\times 200$. As waves approached the critical level, convective rolls formed in the spanwise plane, and these rolls were in turn strongly affected by the enhanced shear of the refracted wave. These results were consistent with the linear stability analysis of Winters & Riley (Reference Winters and Riley1992), who modelled a critically refracted wave as a statically unstable parallel shear flow. Higher resolution studies (with grids up to $3456\times 864\times 1728$) of sheared internal waves were performed by Fritts, Wang & Werne (Reference Fritts, Wang and Werne2013) and Fritts & Wang (Reference Fritts and Wang2013), although their approach was rather different. They considered the effect of ‘fine-scale’ shear on a single, large-scale internal gravity wave of steepness $s=0.5$. The superposition of small-scale shear and the internal wave produced an initial condition locally susceptible to shear instabilities. Fritts *et al.* (Reference Fritts, Wang and Werne2013) also considered the case where the shear is not aligned with the internal wave, but found that wave–shear interactions in such cases are weak and do not lead to a breakdown of the wave.

We shall consider a similar problem to that of Fritts *et al.* (Reference Fritts, Wang and Werne2013) in this study, using DNS to investigate the flow arising from a superposition of a plane internal gravity wave and a sinusoidal shear flow in a triply periodic domain. Motivated by the observations of $\text{{AP}}$, we prescribe the shear flow to vary on a larger vertical scale than the wavelength of the internal wave. We are primarily interested in understanding the key mechanisms involved in the interaction of the wave and the shear, as well as the properties of the turbulence generated from the breakdown of the wave, in particular the associated irreversible mixing and wave–mean flow interaction. In this idealised study, we do not specify the source of the internal gravity wave, but simply choose appropriate parameters to remain consistent with the observations. We acknowledge that for many oceanographic applications, it is useful to quantify mixing associated with specific generation mechanisms, such as oceanic lee waves (Legg Reference Legg2021).

The remainder of the manuscript is organised as follows. Section 2 describes the set-up of the numerical simulations, and also presents the results of some elementary linear ray tracing calculations to provide a link between our nonlinear flow and linear predictions of critical levels from wave–mean flow analysis. Section 3 presents the results of our DNS, focusing on the nature of the wave breaking, the mixing achieved by turbulence, and the effect of the breaking wave on the mean flow. Our findings are summarised in § 4, and their implications are then discussed in the context of internal wave driven mixing in the ocean.

## 2. Numerical simulations

### 2.1. Nonlinear 3-D simulations: domain and initial conditions

We use Diablo (Taylor Reference Taylor2008) to perform DNS of the Navier–Stokes equations subject to the Boussinesq approximation and an imposed, constant mean stratification. The numerical solver implements parallelised pseudospectral methods for spatial derivatives, and time evolution is achieved using a third-order Runge–Kutta scheme. Dealiasing by a $2/3$ rule is applied to the calculation of the nonlinear terms, and periodic boundary conditions are used in all directions. The governing equations read

Here, $p$ is the kinematic pressure and $\theta$ is the dimensionless buoyancy perturbation to a uniform background stratification. The total dimensionless buoyancy is therefore given by

which is related to the full, dimensional density profile by

where $\rho _0$ is a typical scale for the mean density and $\Delta \rho$ is a typical scale for the density fluctuations. The dimensionless parameters in (2.1)–(2.3) are the Reynolds number, the Prandtl number, and the Richardson number, defined as

*a*–

*c*)\begin{equation} Re = \frac{L_0 U_0}{\nu}, \quad Pr = \frac{\nu}{\kappa}, \quad Ri_0 = \frac{g\Delta\rho L_0}{\rho_{0} U_0^{2}}=\frac{N_0^{2}L_0^{2}}{U_0^{2}}, \end{equation}

where $N_0$ is the buoyancy frequency of the uniform background stratification; $U_0$ and $L_0$ are typical velocity and length scales associated with a background shear flow, and $\nu $ and $\kappa $ are the diffusivities of momentum and heat respectively. In all of our simulations, the bulk Richardson number $Ri_0$ is set equal to one so that the inertial time scale $L_0/U_0$ is equal to the buoyancy time scale ${N_0}^{-1}$. The Prandtl number $Pr$ is also set to one in every simulation to enable, subject to the constraint of the computational resources available to us, adequate resolution of the small-scale dynamics at (what we believe to be) sufficiently high Reynolds number $Re$.

We note that the Prandtl number appropriate for seawater at $20\,^{\circ }\textrm {C}$ is $Pr=7$, and for flows stratified by salinity, $Pr$ (or more precisely the Schmidt number) takes values of $O(1000)$. Previous studies have highlighted significant $Pr$-dependence of the mixing properties (Smyth, Moum & Caldwell Reference Smyth, Moum and Caldwell2001), interface evolution (Xu, Stastna & Deepwell Reference Xu, Stastna and Deepwell2019) and secondary instabilities (Salehipour, Peltier & Mashayek Reference Salehipour, Peltier and Mashayek2015) in simulations of stratified flows. Although we cannot capture these effects at $Pr=1$, the flow we consider requires high values of $Re$, making DNS at high $Pr$ currently infeasible.

As discussed in the introduction, we are inspired and motivated by the observations of Alford & Pinkel (Reference Alford and Pinkel2000, $\text{{AP}}$) of wave breaking in the thermocline, and consider the flow developing from the superposition of a plane internal gravity wave and a sinusoidal shear flow. $\text{{AP}}$ estimated the vertical wavenumber of large-amplitude internal waves associated with overturning events to be approximately ${m\approx 2{\rm \pi} /({12}\ \textrm {m})}$. By inspecting vertical profiles of the effective strain rate $\partial w/\partial z$ and accounting for Doppler shifts by horizontal currents, they also estimated a typical horizontal wavenumber of the waves as $\kappa \approx 2{\rm \pi} /({180}\ \textrm {m})$. These estimates coincide with measurements of vertical shear that vary on a length scale of $O({30}\ \textrm {m})$. It is not possible to resolve centimetre-scale dissipation adequately using DNS while also resolving the dynamics associated with lengths $O({100}\ \textrm {m})$. We therefore perform a ‘miniaturised’ simulation of the shear and internal wave interaction by reducing the Reynolds number to a computationally tractable value.

In a periodic domain of dimensionless height $2{\rm \pi}$, we set $\bar {u}(z) = \sin z$ as the base shear flow. The minimum gradient Richardson number of this flow ${Ri_m = \min (Ri_g)}$ is equal to the bulk Richardson number $Ri_0=1$, with $Ri_g$ taking this value at the edge of the domain ($z=0, 2{\rm \pi}$) as well as at the mid-height $z={\rm \pi}$. This ensures that the background shear profile is linearly stable, as shown by Balmforth & Young (Reference Balmforth and Young2002). We superimpose this shear flow and a plane internal gravity wave with dimensionless wave vector $\boldsymbol {k} = (k,l,m) = (1/4, 0, 3)$. Compared to the observational estimates of $\text{{AP}}$ the wave has a similar propagation angle, and the ratio between the vertical wavenumber of the shear ($m=1$) and the vertical wavenumber of the wave ($m=3$) also provides a good match to the observations. Preliminary simulations showed that waves oriented perpendicular to the shear flow (with $k=0, l\neq 0$) produce insignificant interactions even at large amplitude, consistent with the findings of Fritts *et al.* (Reference Fritts, Wang and Werne2013). We therefore focus only on the case where the planes of the wave and shear are aligned.

We perform simulations at Reynolds numbers of 5000 and 8000. The dimensionless domain size is chosen to fit one horizontal wavelength of the internal wave and one wavelength of the shear. Preliminary runs showed that the scale of spanwise motion that develops is small, so we choose a narrow domain of size $8{\rm \pi} \times {\rm \pi}/2\times 2{\rm \pi}$. Setting the kinematic viscosity to $\nu =1\times 10^{-6}\ \textrm {m}^{2}\ \textrm {s}^{-1}$, typical of water, and choosing a typical buoyancy frequency of ${N_0=5\times 10^{-3}\ \textrm {s}^{-1}}$, we can deduce typical velocity and length scales from our choices of $Re$ and $Ri_0$. For the highest value of $Re=8000$ this gives $L_0={1.26}\ \textrm {m}$ and $U_0=6.3\ \textrm {mm}\ \textrm {s}^{-1}$, and hence an effective domain size of approximately $32\ \textrm {m}\times 2\ \textrm {m} \times 8\ \textrm {m}$.

In the dimensionless Boussinesq system (2.1)–(2.3), internal gravity waves in the $xz$-plane are given by the real parts of the polarisation relations

*a*–

*c*)\begin{equation} \left.\begin{gathered} \theta = \frac{s}{m} \exp({\textrm{i}(\boldsymbol{k}\boldsymbol{\cdot}\boldsymbol{x} - \omega t + \varphi)}), \quad u = \frac{-\textrm{i}s\omega}{k} \exp({\textrm{i}(\boldsymbol{k}\boldsymbol{\cdot}\boldsymbol{x} - \omega t + \varphi)}), \\ w = \frac{\textrm{i}s\omega}{m} \exp({\textrm{i}(\boldsymbol{k}\boldsymbol{\cdot}\boldsymbol{x} - \omega t + \varphi)}), \end{gathered}\right\}\end{equation}

where $\varphi$ is an arbitrary constant phase and $s>0$ is the wave steepness, representing a dimensionless amplitude that satisfies $s=1$ when buoyancy contours first become vertical somewhere in the domain. For $Ri_0=1$, the dimensionless wave frequency $\omega$ satisfies the dispersion relation

To construct the initial condition for our simulations, we take the positive root of (2.8), set $k=1/4$ and $m=3$, and (without loss of generality) choose $\varphi =0$. Superposed with the shear flow, this gives the initial condition

The values of wave steepness $s$ used in the simulations are outlined with all other relevant parameters in table 1.

The initial conditions for the buoyancy and velocity fields are displayed in figure 1 for two values of $s$ used in the simulations. Small-amplitude, 3-D noise is added to the velocity field to allow the development of spanwise motion from the 2-D initial condition of (2.9) and (2.10). All simulations begin on a uniformly spaced grid at the ‘initial resolution’ specified in table 1. This resolution corresponds to a grid spacing of $\Delta x = {\rm \pi}/256 \approx 1.2\times 10^{-2}$. As the simulations develop, this spacing is compared to the minimum Kolmogorov length scale calculated from the horizontally averaged turbulent dissipation rate

*a*,

*b*)\begin{equation} L_K(t) = \min_z\left(\varepsilon_H(z,t) {Re}^{3}\right)^{{-}1/4} , \quad \varepsilon_H(z,t) = \frac{1}{Re} \,\overline{\frac{\partial u_i^{\prime}}{\partial x_j} \frac{\partial u_i^{\prime}}{\partial x_j}} . \end{equation}

Here, an overbar denotes a horizontal average, and a prime denotes the deviation from that horizontal average. Once $L_K$ becomes smaller than the initial $\Delta x$, the flow is upscaled to a higher resolution grid with a grid spacing of ${\Delta x={\rm \pi} /512\approx 6.1\times 10^{-3}}$. The upscaling is achieved through performing an inverse fast Fourier transform onto the higher resolution grid to preserve the exact spectral form of the flow fields. At late times in the simulations, $L_K$ once again rises above the initial grid resolution as the turbulence decays. Once this happens, the extra Fourier modes associated with the higher resolution are truncated and we return to simulating the flow on the initial grid. After either upscaling or downscaling, time series of the turbulence statistics remain consistent and exhibit no sudden jumps or deviations.

### 2.2. Qualitative insight from linear ray theory: critical levels

In the absence of any mean flow, the internal gravity wave (2.7*a*–*c*) propagates (in terms of its energy) at the group velocity

where we have taken the positive root of the dispersion relation (2.8) to match the initial condition (2.9)–(2.11). For $k, m>0$ the wave therefore propagates down and to the right. In a constant mean flow $\boldsymbol {U}$, the frequency of the internal gravity wave appears to change as the wave is Doppler shifted. The frequency seen by a stationary observer, which we shall refer to as the extrinsic frequency, is given by

where $\hat {\omega }$ is the frequency arising from the dispersion relation (2.8), which we shall refer to as the intrinsic frequency. This intrinsic frequency may equivalently be defined as the frequency observed when travelling with the mean flow. The terminology regarding Doppler shifts can often be unclear from the literature, with the monographs of Sutherland (Reference Sutherland2010) and Bühler (Reference Bühler2014) disagreeing on the extrinsic/intrinsic distinction. In defining the extrinsic frequency $\omega$ as that seen by a stationary observer, we follow the notation and terminology of Bühler (Reference Bühler2014).

We consider the propagation of an internal gravity wave through the 1-D mean shear flow $\bar {u}(z)$ as originally considered by Booker & Bretherton (Reference Booker and Bretherton1967). Assuming that this shear flow varies ‘slowly’ in $z$, the extrinsic frequency defined in (2.14) becomes

The wave will then propagate along a ‘ray’ in the direction of the extrinsic group velocity

*a*,

*b*)\begin{equation} \frac{\textrm{d} x}{\textrm{d}t} = \frac{\partial \omega}{\partial k} = \bar{u} + \hat{c}_{g,x}, \quad \frac{\textrm{d}z}{\textrm{d}t} = \frac{\partial \omega}{\partial m} = \hat{c}_{g,z} , \end{equation}

where $\boldsymbol {\hat {c}_g}$ is the intrinsic group velocity detailed in (2.13). Since the mean flow is independent of time, the extrinsic frequency will be conserved along the ray, that is $\textrm {d}\omega /\textrm {d}t=0$. The wave vector $\boldsymbol {k}=(k,m)$ must therefore vary along the ray such that

*a*,

*b*)\begin{equation} \frac{\textrm{d}k}{\textrm{d}t} ={-}\frac{\partial \omega}{\partial x} = 0, \quad \frac{\textrm{d}m}{\textrm{d}t} ={-}\frac{\partial \omega}{\partial z} ={-} k \frac{\textrm{d}\bar{u}}{\textrm{d}z} . \end{equation}

The horizontal wavenumber $k=k_0$ is conserved along the ray, whereas the vertical wavenumber $m$ will change according to the mean shear.

In the simple case of a constant mean shear $\partial \bar {u}/\partial z=S_0$, the vertical wavenumber satisfies $\textrm {d}m/\textrm {d}t = -kS_0$. For positive $k$, the vertical wavenumber therefore decreases in the presence of positive shear, and increases in the presence of negative shear. As $m$ increases, with $k$ kept constant, the intrinsic frequency $\hat {\omega }$ decreases and the group velocity vector becomes closer to horizontal (as can be inferred from (2.13) for large $m$). Conservation of $\omega \equiv \omega _0$ combined with the form of (2.15) can predict the existence of a critical level where the intrinsic frequency $\hat {\omega }$ drops to zero and $m$ becomes infinite. Setting $\hat {\omega }=0$ in (2.15) implicitly defines the height of a critical level as

As waves propagate towards a critical level, they typically grow in amplitude until they ‘break’ through instabilities.

These ray-tracing calculations are commonly used to investigate the propagation of a localised wavepacket through a large-scale (i.e. slowly varying compared with characteristic length scales of the wavepacket) mean flow, where they can be formally derived using a classical WKBJ (Wentzel–Kramers–Brillouin–Jeffreys) asymptotic approximation argument. Our set-up of a relatively large-amplitude plane wave superposed on a shear flow throughout the entirety of our computational domain is quite different, and in particular, the required scale separation underlying the validity of the derivation of the ray-tracing equations does not occur. Nevertheless, as we demonstrate below, solutions to these equations still provide valuable qualitative insight into the behaviour of the full nonlinear (and relatively rapidly spatially varying) flow. We attempt to model this system by considering the paths of wavepackets (traced using these linear ray equations) with the same properties as the plane wave, from different initial positions. All wavepackets have the initial wave vector $(1/4, 3)$, and hence also have the same initial intrinsic frequency. However, the extrinsic frequencies, that are conserved along each ray, depend on the initial height $z_0$.

Figure 2 displays the results of numerically solving (2.16*a*,*b*) for the mean flow $\bar {u}=\sin z$. The vertical propagation of 17 wavepackets, equally spaced out at time 0, is shown in figure 2(*a*). The majority of the rays end up in the centre of the domain where the background shear is negative, and their vertical propagation decreases. This is consistent with our earlier discussion of wave propagation through a uniform shear. Since each initial wavepacket height has a different extrinsic frequency $\omega _0$, (2.18) can predict critical levels at multiple heights. For the flow considered, (2.18) gives the predicted critical levels through

Figure 2(*b*) plots the critical levels (if they exist) associated with each of the rays in figure 2(*a*). The above equation predicts critical levels for approximately 75 % of possible initial heights $z_0$. The upper and lower bounds on critical levels are also shown in figure 2(*b*).

Before moving on to analyse the results of the direct numerical simulations, we must emphasise that we do not expect the above linear analysis to describe the development of the flow quantitatively. We instead believe that the analysis illustrates qualitatively some key phenomena that occur in the flow and provides some physical insight into its behaviour. In particular, we expect energy to build up in the region of negative shear due to wave refraction and the appearance of critical levels. A subsequent breakdown to turbulence is then likely through small-scale instabilities and nonlinearities, although this may be affected by diffusion if the instabilities develop on a sufficiently slow time scale.

## 3. Results

### 3.1. Flow phenomenology and wave breakdown

We now describe the results of the (inherently nonlinear) 3-D direct numerical simulations outlined in § 2.1. We begin by outlining key features of the flow arising from the initial condition with wave steepness $s=1$, and later compare these results to those with less energetic initial conditions. Figure 3 presents vertical plane snapshots of the total buoyancy field $b=z+\theta$ at various times of simulation R8s1 up to $t=32$. Figure 4 shows the vorticity field associated with the same vertical planes, with the streamwise vorticity $\zeta _x = \partial _y w - \partial _z v$ plotted in the $yz$-planes and the spanwise vorticity $\zeta _y = \partial _z u - \partial _x w$ in the $xz$-planes.

By time $t=8$, shown in panel (*d*), the tilted structure of the internal gravity wave has been distorted by the shear flow. As predicted by the ray tracing calculations in § 2.2, vertical length scales associated with the wave decrease where the mean shear is negative, at mid-heights in the domain. The effect of this wave refraction on the buoyancy field can be seen in figure 3(*d*). In the centre of the domain, regions with statically unstable buoyancy profiles emerge, flanked by ‘sheets’ of strong stratification where buoyancy contours are pushed close together. This is consistent with the predictions of figure 12 for the local wave steepness to increase near $z={\rm \pi}$, and points to a local buildup of available potential energy. In contrast, the buoyancy contours closer to the top and bottom of the domain flatten and relax towards the mean uniform stratification.

Panel (*e*) is the first to highlight 3-D motion in the flow at time $t=16$. Coherent normal mode-like disturbances emerge in the streamwise vorticity of figure 4(*e*) with a spanwise wavenumber of $l\approx 20$. These vorticity structures are generated in the regions where the buoyancy field is statically unstable, which suggests that they are generated through a convective instability. Indeed, the mushroom-like plumes in figure 3(*e*) further suggest that the structures can be classified as convective rolls aligned on the streamwise axis. Preliminary simulations at lower resolution showed that the wavenumber $l$ associated with the rolls is independent of the width of the domain in the $y$ direction. We are therefore confident that the narrow domain still captures sufficient three-dimensionality in the flow, particularly since the rolls subsequently break down into smaller scale turbulence as they are advected by the flow.

At the same time as the appearance of the convective rolls, spanwise vorticity intensifies locally in the $xz$-plane. The dark green regions in figure 4(*f*) highlight strong negative vertical shears that emerge in the centre of the domain. In a canonical stratified shear layer, the stability of such a region would be determined by the gradient Richardson number, but in this case such a number is difficult to quantify. Firstly the shear layer depth $\lambda _{SL}$, for which an estimate is shown on figure 4(*f*), varies in both space and time. Secondly, the maximum shear is offset compared with the peak in stratification. In fact the shear layer spans regions where the buoyancy field transitions between static instability and strong stratification. The strong local shears nevertheless present a potential route for further instabilities to develop.

By time $t=24$, shown in panels (*g*,*h*), the small-scale convective disturbances have interacted with the strong shears in the centre of the domain, generating a turbulent flow characterised by relatively intense small-scale vortices. Comparing the vorticity field in figure 4(*h*) with the buoyancy field in figure 3(*h*), we find that the turbulence emerges in a region of highly variable local stratification. This can have a significant impact on local irreversible mixing of the buoyancy field, as we investigate further in Howland, Taylor & Caulfield (Reference Howland, Taylor and Caulfield2021).

The final snapshots presented in figures 3 and 4 highlight a striking organisation of the turbulence into large structures. Undulations in the isopycnals in figure 3(*j*) are closely reflected by intense patches in the vorticity field of figure 4(*j*). These patches are somewhat reminiscent of the ‘billows’ that arise from the development of Kelvin–Helmholtz instability (KHI). The emergence and evolution of these flow structures can be seen in supplementary movies 1 and 2. Although it is plausible that these billows are essentially finite-amplitude manifestations of a linear shear instability, we must add a number of caveats to this interpretation. As mentioned above, the shear layer that develops is not steady and its depth and velocity jump both vary in space and time. Further work is needed to understand better the nature of instabilities in temporally varying stratified flows. Kaminski, Caulfield & Taylor (Reference Kaminski, Caulfield and Taylor2017) and Kaminski & Smyth (Reference Kaminski and Smyth2019) have also shown that finite-amplitude perturbations and pre-existing turbulence can significantly impact the development of shear-driven billows in a stratified shear layer. The disturbances introduced by the convective rolls therefore make it difficult to estimate the size of the billows from the initial wave set-up. An alternative hypothesis is that small-scale vortices, formed through shearing of the convective disturbances, undergo a form of inverse cascade in the presence of the mean shear.

By the time $t=32$ of the final snapshots, the turbulent dissipation rate $\varepsilon ^{\prime }$ has already peaked and the subsequent flow is that of a turbulent decay. Figure 5 highlights the change in various horizontally averaged quantities between the initial condition and the flow state at the late time $t=80$. As the turbulence decays, the buoyancy contours flatten in the middle of the domain and leave alternating regions of relatively weak and strong stratification. This variation is clear in the mean vertical profiles of $N^{2}$ shown in figure 5(*c*), with three strong peaks in the middle of the domain associated with a 30 %–50 % increase in the local buoyancy gradient. The mean shear shows similar vertical variation in figure 5. At mid-heights in the domain, local extrema in $S^{2}$ appear offset from local extrema in $N^{2}$, akin to the form of an internal wave. Despite this offset, regions with significant shear exhibit a gradient Richardson number of $Ri_g\approx 1$, similar to the initial profile. The most intense mean shears lead to a minimum Richardson number of $Ri_m\approx 1/2$, significantly above the value of $1/4$ that ensures linear stability. The simulations are continued up until $t\approx 150$, although the remaining dynamics after $t=80$ in case R8s1 could primarily be characterised as relaminarisation, with the smaller-scale variations seen in figure 5 being smeared out by diffusion.

### 3.2. Energetics

With a basic understanding of how the flow develops in simulation R8s1, we now investigate how the Reynolds number $Re$ and initial wave steepness $s$ modify the dynamics. We begin by further investigating the emergence of 3-D motion associated with the convective rolls in figures 3(*e*) and 4(*e*). Time series for each component of the kinetic energy

and the potential energy $\mathcal {P}=Ri_0 \langle \theta ^{2}\rangle /2$ are plotted in figure 6, where $\langle \cdot \rangle$ denotes a volume average. The time series are plotted on a logarithmic scale, and in every simulation we see a period where the energy of the spanwise velocity $\mathcal {K}_v$ increases with an approximately linear slope, indicating exponential growth. This growth in $\mathcal {K}_v$ is less steep for the two cases with initial wave steepness, and occurs significantly later for simulation R8s0, where $s=0.5$.

To demonstrate further evidence that the mechanism driving this growth is convective, we calculate a Rayleigh number, defined in our dimensionless framework as

where $\Delta b$ and $\Delta z$ are calculated as follows. For every horizontal position $(x,y)$, we consider the vertical profile of buoyancy $b(z)$. In this profile we identify the largest continuous region with $\partial b/\partial z < 0$ and denote its size by $\Delta z$. We then take $\Delta b$ as the buoyancy difference across this region to compute the Rayleigh number through (3.2). Taking the maximum Rayleigh number across all horizontal positions then provides us with some information on whether convection is likely to be occurring somewhere in the domain. Classical linear stability results predict the onset of convection above a Rayleigh number of $O(1000)$, with the critical value varying depending on the boundary conditions considered (see, for e.g. Drazin & Reid Reference Drazin and Reid2004). In figure 6 we additionally plot the time at which the maximum value of $Ra$ in the domain first exceeds 2000 for each simulation. Every case shows that the growth in $\mathcal {K}_v$ only occurs after statically unstable regions form and the Rayleigh number gets sufficiently large. This, together with the quasi-exponential energy growth, provides strong evidence that 3-D motion is brought about through a convective linear instability. To be clear, this result only informs us of the first source of small-scale disturbances in the flow, and it cannot be used to determine how energy is supplied to turbulence for mixing at later times.

For simulation R8s0, with the smallest initial wave steepness $s=0.5$, the peak in the energy of the spanwise velocity $\mathcal {K}_v$ is significantly lower than in any of the other cases. The fact that the energy growth occurs later and more slowly than in other cases may allow diffusive effects to impact the saturation of the convective instability. To investigate this, we present a simple extension to add diffusion to the linear ray-tracing theory in Appendix A. From this analysis, it is plausible that diffusive effects are impacting the development of the wave for the cases with $s<1$, but quantitative predictions cannot be drawn from the linear theory.

The ray theory analysis introduced in § 2.2 of course relies on a number of bold assumptions that are not even well satisfied by the initial conditions. It is therefore remarkable how well ray theory can provide useful intuition for certain aspects of the flow, such as in figure 7, where we plot the horizontally averaged buoyancy flux $\mathcal {J}=Ri_0 \overline {w\theta }$ for each simulation. Recall that positive values of $\mathcal {J}$ describe a transfer of potential energy to kinetic energy. The net buoyancy flux associated with the plane wave initial condition is zero, but as the wave is distorted by the mean flow, large and reversible exchanges between the kinetic and potential energies occur. Figure 7 highlights that these exchanges are qualitatively similar at early times for all of the simulations. In the top half of the domain, alternating patches of high and low buoyancy flux appear to propagate downwards over time. Particularly for the cases with lower initial wave steepness, shown in panels (*c*,*d*), this propagation is qualitatively reminiscent of the wave refraction seen in the ray-tracing results. At late times in panel (*c*), significant wave activity, inferred from the buoyancy flux, is only present at heights similar to the critical level locations specified in figure 2(*b*).

In some cases turbulence is intensified when the internal wave rays converge at the critical level. This is reflected in the horizontally averaged dissipation rate of turbulent kinetic energy (TKE) $\varepsilon ^{\prime }$, shown in figure 8, where

As before, an overbar denotes a horizontal average and a prime denotes the perturbation from the horizontal average. In the simulations with $s=0.75$ and $s=1$, a patch of large $\varepsilon ^{\prime }$ emerges in the middle of the domain in the range $t=15 - 30$. This is consistent with internal wave energy converging near the middle of the domain, transitioning to turbulence through small-scale shear and/or convective instabilities and generating localised turbulence and energy dissipation. Despite the chaotic, small-scale turbulence present in the $s=1$ simulations, panels (*a*) and (*b*) are remarkably similar. Raising $Re$ from 5000 to 8000 results in minimal changes to the flow structure, which reassures us that the simulations are at sufficiently high $Re$ to resolve oceanographically relevant turbulent mixing. By contrast $\varepsilon ^{\prime }$ does not exhibit a strong burst when $s=0.5$ in panel (*c*). Here, downward propagating structures, most likely associated with the refracted internal wave, are most prominent. The relationship between TKE production and dissipation will be explored in more detail using the perturbation potential and kinetic energy budgets in the next section.

### 3.3. Turbulence and mixing

For the simulations with higher initial wave steepness, the turbulent wave breaking event, identified by heightened dissipation of kinetic energy in figure 8, leads to high-frequency, small-scale features in the buoyancy flux of figure 7. However, the large-scale pattern in the buoyancy flux $\mathcal {J}$ remains present during the burst of turbulent activity for times ${20< t<40}$, with patches of alternating sign overlaying the small-scale details associated with turbulence. This is significant in the context of irreversible mixing, where $\mathcal {J}$ is often used to infer a diapycnal mixing rate when appropriately averaged.

Consider (as in e.g. Howland, Taylor & Caulfield Reference Howland, Taylor and Caulfield2020) decomposing the kinetic and potential energies into contributions from the horizontally averaged fields $\bar {\boldsymbol {u}}$, $\bar {\theta }$ and their perturbations $\boldsymbol {u}^{\prime }$, $\theta ^{\prime }$. For example the volume-averaged potential energy can be decomposed as ${\mathcal {P}=\bar {\mathcal {P}} + \mathcal {P}^{\prime }}$, where

*a*,

*b*)\begin{equation} \bar{\mathcal{P}} = \frac{Ri_0}{2} \left\langle \bar{\theta}^{\,2} \right\rangle, \quad \mathcal{P}^{\prime} = \frac{Ri_0}{2} \left\langle {\theta^{\prime}}^{2} \right\rangle , \end{equation}

and $\langle \cdot \rangle$ denotes the volume average. Performing a similar decomposition for the kinetic energy leads to the following evolution equations for the energy components:

*a*,

*b*)$$\begin{gather} \frac{\textrm{d}\bar{\mathcal{K}}}{\textrm{d}t} ={-}S_p - \bar{\varepsilon} ,\quad \frac{\textrm{d}\mathcal{K}^{\prime}}{\textrm{d}t} = S_p + \mathcal{J} - \varepsilon^{\prime} , \end{gather}$$

*a*,

*b*)$$\begin{gather}\frac{\textrm{d}\bar{\mathcal{P}}}{\textrm{d}t} ={-}N_p - \bar{\chi} ,\quad \frac{\textrm{d}\mathcal{P}^{\prime}}{\textrm{d}t} = N_p - \mathcal{J} - \chi^{\prime} . \end{gather}$$

The mean-perturbation exchange terms are defined as

*a*,

*b*)\begin{equation} S_p ={-}\left\langle \overline{w^{\prime} \boldsymbol{u}^{\prime}} \boldsymbol{\cdot} \frac{\partial \bar{\boldsymbol{u}}}{\partial z} \right\rangle , \quad N_p ={-}\left\langle \overline{w^{\prime} \theta^{\prime}} \frac{\partial \bar{\theta}}{\partial z} \right\rangle , \end{equation}

and the dissipation rates of the perturbation energies are given by

*a*,

*b*)\begin{equation} \varepsilon^{\prime} = \frac{1}{Re}\left\langle \frac{\partial u_i^{\prime}}{\partial x_j}\frac{\partial u_i^{\prime}}{\partial x_j} \right\rangle , \quad \chi^{\prime} = \frac{Ri_0}{RePr} \left\langle \frac{\partial \theta^{\prime}}{\partial x_j}\frac{\partial \theta^{\prime}}{\partial x_j} \right\rangle . \end{equation}

Dissipation rates associated with the mean quantities are defined as $\bar {\varepsilon }=\langle |\partial \bar {\boldsymbol {u}}/\partial z|^{2} \rangle /Re$ and $\bar {\chi } = Ri_0\langle (\partial \bar {\theta }/\partial z)^{2} \rangle /RePr$. Note that the above decomposition cannot distinguish between energy in internal waves and energy in turbulence since both contribute to the perturbation energy quantities. Here, we also assume that the available potential energy of the system can be approximated by ${\mathcal {P}=Ri_0\langle \theta ^{2}\rangle /2}$, and therefore that $\chi$ is an appropriate measure of irreversible diapycnal mixing. The validity of this approximation is revisited in Howland *et al.* (Reference Howland, Taylor and Caulfield2021), where we find only small discrepancies between $\chi$ and the ‘true’ rate of diapycnal mixing $\mathcal {M}$ for the flows considered here.

In a statistically steady state where energy is supplied from the mean flow through the shear production $S_p$, we expect the buoyancy variance destruction rate $\chi ^{\prime }$ to balance $-\mathcal {J}$. This in turn implies that $\mathcal {J}<0$ and the buoyancy flux represents a mean transfer of kinetic energy to potential energy. In our simulations, however, turbulence is most intense in regions where the larger-scale wave–mean flow interaction leads to a positive buoyancy flux (for example, see $z\approx {\rm \pi}$ in panel $(a)$ for $20< t<40$). In fact the total mean buoyancy flux (integrated over the domain and in time) is positive in all of the simulations, indicating a net transfer of potential energy to kinetic energy. The magnitude of this transfer varies significantly between the simulations, taking values between 24 % and 40 % of the initial perturbation potential energy. The classic shear-driven steady state assumption, as used by Osborn (Reference Osborn1980), clearly does not apply in this case. Indeed this assumption does not even apply to the canonical evolution of a stratified shear layer (Mashayek & Peltier Reference Mashayek and Peltier2013). Despite the emergence of flow structures related to vertical shear (as seen in figure 4), the turbulence in the flow primarily draws energy from the wave rather than the mean shear flow. We therefore use the volume-averaged dissipation rates defined in (3.8*a*,*b*) to investigate mixing properties and the evolution of turbulence in the simulations.

Time series of the decomposed dissipation rates are plotted for each simulation in figure 9. Comparing the time series with the vorticity snapshots in figure 4, we unsurprisingly see that the dissipation rates peak when intense small-scale turbulence spans the domain at mid-heights. In figures 9(*a*) and 9(*b*), the fact that $\chi ^{\prime }$ peaks at the same time as $\varepsilon ^{\prime }$ suggests that, although the convective rolls seen in figures 3(*e*) and 4(*e*) are the first small-scale structures to emerge, their contribution to mixing is small. Indeed the overall shape of the time series curves for $s=1$ in figure 9 are reminiscent of those for the development of KHI in a stratified shear layer (see e.g. Salehipour *et al.* Reference Salehipour, Peltier and Mashayek2015). Particular features that stand out include a sharp, early rise in $\varepsilon ^{\prime }$, a short-lived ‘fully turbulent’ stage where the dissipation rates are approximately constant, and a fast decay from this regime. This is in contrast to some other canonical flows, such as the development of Holmboe instability, which lead to more long-lived turbulent activity (Salehipour, Caulfield & Peltier Reference Salehipour, Caulfield and Peltier2016).

In the simulations with $s=1$, the dominant contribution to mixing (quantified by the time integral of $\chi$) comes from the ‘fully turbulent’ period $25\lesssim t \lesssim 35$. The instantaneous mixing efficiency during this period is $\eta = \chi /(\chi +\varepsilon ) \approx 0.24$, which matches the KHI simulations of Salehipour *et al.* (Reference Salehipour, Peltier and Mashayek2015) for $Pr=1$. Together with the temporal evolution of $\varepsilon ^{\prime }$ and the development of ‘billow’ structures in figure 4(*j*), this makes a strong argument that mixing in these flows is primarily the result of turbulence driven by local shear instabilities, despite the presence of localised convection. Indeed the similarity in mixing efficiency is remarkable given the highly irregular buoyancy field on which the billows develop in our simulations.

In both of the simulations with lower initial wave steepness, the peaks in dissipation rates are far smaller than for the more energetic initial conditions. For simulation R8s0, where $s=0.5$, the maximum value of $\varepsilon ^{\prime }$ never even exceeds the dissipation rate associated with laminar diffusion of the mean flow $\bar {\varepsilon }$. To visualise how these flows differ from the higher dissipation cases, vorticity snapshots are plotted in figure 10 for the times at which $\varepsilon ^{\prime }$ is at its maximum. Panels (*a*,*b*) show that no large billow structures develop in case R8s0, and the maximum TKE dissipation rate is instead achieved when the convective rolls saturate in the spanwise plane. Although the buoyancy field is sufficiently distorted by the shear to drive local convection, the local amplification of shear in the $xz$-plane is reduced compared with figure 4. Treating the dynamics as that of a refracted wave, we can think of the wave only achieving high values of steepness once its vertical wavenumber $m$ has also increased significantly. The smaller scales associated with high values of $m$ are more susceptible to viscous effects, and it is possible that locally intense shear is smeared out by diffusion before instabilities can grow significantly. As seen in figures 10(*c*) and 10(*d*), turbulent structures emerge from regions of high shear at slightly higher initial wave steepness (${s=0.75}$). The local shear layers are not as thin as for $s=0.5$, consistent with the idea that instabilities are more likely to develop when viscous effects are reduced. At larger values of $Re$, it may be possible for turbulent billows to grow from $s=0.5$ and lead to significant local dissipation and mixing. The turbulence would remain far more localised due to the thinner shear layers, but it is possible that the combination of convective and shear mechanisms seen in the cases where $s=1$ would remain relevant.

### 3.4. Mean flow interactions

For the simulations with the largest wave steepness, we have deduced that the majority of turbulent dissipation and mixing can be associated with turbulence arising from shear instabilities. As mentioned above, turbulent shear flows are often associated with a transfer of energy from the mean flow to the turbulence through the local shear production

Positive values of $S_p$ represent an extraction of energy from the mean flow, as highlighted by the TKE evolution equation in (3.5*a*,*b*).

From another perspective, internal waves breaking at a critical level typically provide momentum to the mean flow as shown in the classical experiments of Koop & McGee (Reference Koop and McGee1986). This momentum transfer is a vital part of the mechanism discussed by Plumb (Reference Plumb1977) to describe the atmospheric quasi-biennial oscillation. In our simulations, we appear to observe shear instabilities developing near critical levels, and therefore expect the development of the mean flow to rely on a combination of these effects.

To investigate how the wave breaking affects the mean flow, we plot the shear production $S_p$ from simulation R8s1 as a function of $z$ and $t$ in figure 11. The time series of volume-averaged shear production, shown in figure 11(*a*), is dominated by large, reversible changes at early stages of the simulation. Indeed the mean value of $S_p$, averaged over both space and time for 80 time units, is only $O(10^{-5})$, indicating a small net transfer of energy between the mean flow and its perturbation (relative to the energy changes due to turbulent dissipation). This contrasts with the evolution of KHI in a stratified shear layer (Salehipour & Peltier Reference Salehipour and Peltier2015). Although large, reversible changes are also seen at early times in that set-up, the lack of initial perturbation energy requires a significant net transfer of energy from the mean flow over the course of a turbulent event.

The small net energy transfer does not, however, mean that the mean flow is unaffected by its interaction with the breaking wave. Figure 11(*c*) plots the time-averaged shear production as a function of height, showing that $S_p<0$ in the centre of the domain, whereas $S_p>0$ near the edges. This suggests that, although the turbulence produced at mid-heights in the domain is reminiscent of that triggered by shear instabilities, any local extraction of energy from the mean flow in this region is dominated by the earlier wave–mean flow interaction. This is emphasised in the space–time plot of figure 11(*b*), where a strong patch of negative shear production persists at mid-heights even as the turbulence develops at $t\approx 25$. As hinted at earlier in figure 4, we can therefore interpret the billows as arising from instabilities of the wave's shear rather than the mean flow. The evolution of the mean flow appears primarily governed by its interaction with the coherent internal wave, and is only slightly modified by the subsequent turbulence.

This interpretation of a wave–mean flow interaction is also consistent with the shift in mean streamwise velocity shown earlier in figure 5(*a*). Since the wavenumbers of the internal wave $k$ and $m$ are both positive, we expect the wave to propagate to the right and downwards (in the positive $x$ and negative $z$ directions) even as it is refracted by the shear flow. If the wave then deposits its momentum as it approaches the predicted critical levels, we would expect a positive shift in the streamwise velocity in that region, since the wave is propagating to the right. This is precisely what we see in figure 5(*a*), where $\bar {u}$ increases over the region $3{\rm \pi} /4\lesssim z \lesssim 3{\rm \pi} /2$.

## 4. Discussion and conclusions

We have investigated the flow arising from the superposition of a large amplitude plane internal gravity wave and a mean shear flow. This initial condition is inspired and motivated by observations of high internal wave strain in the presence of variable shear in regions of the thermocline by Alford & Pinkel (Reference Alford and Pinkel2000, $\text{{AP}}$). In our simulations, some aspects of the dynamics at early times can be reasonably described by ray-tracing analysis, despite a lack of the necessary, assumed scale separation between the base flow and the wave field. The propagation of wave energy quantities towards the centre of the domain shows qualitative agreement between the simulations and the linear theory, as seen in figure 7. This analysis suggests that critical levels, whose locations are highlighted in figure 2, exist in this region where the mean shear is negative. Ray tracing predicts an increase in the vertical wavenumber $m$ as waves approach the critical levels.

The DNS is consistent with this picture, (even though the underlying assumptions of the ray theory are clearly not satisfied) as seen in the snapshots of figures 3 and 4. Vertical length scales are reduced in the centre of the domain, and regions of statically unstable buoyancy emerge as the wave field is distorted by the shear. Streamwise-aligned convective rolls, best highlighted by figures 3(*e*) and 4(*e*), emerge from the regions of static instability in all of the simulations, regardless of their initial wave steepness. Quasi-exponential growth in the energy of the spanwise velocity is observed in figure 6 once the maximum local Rayleigh number in the domain becomes large. We deduce that the roll structures in the spanwise plane are simply driven by a linear convective instability.

The accumulation of wave energy in the centre of the domain also leads to an intensification of local shear in the $xz$-plane. Flows arising from the more energetic initial condition (where $s=1$) subsequently become turbulent and exhibit large-scale organisation in the form of elliptical billow structures. These billows, visualised in figure 4(*j*), are reminiscent of those arising due to KHI in a stratified shear layer. Furthermore the time series of dissipation rates in figure 9 show that wave breaking is characterised by a ‘burst’ or ‘flare’ of turbulence, rather than a sustained event. This bursting nature is again reminiscent of turbulence initiated through KHI.

When turbulence persists throughout the domain at mid-heights, the mixing efficiency is also largely similar to that found in previous studies of KHI at $Pr=1$. The buoyancy field surrounding the local shear layers in our simulations is complex, with regions of strong, stable stratification, and static instability present either side of the shear layer. It is therefore somewhat surprising that the mixing results are consistent with a typical stratified shear layer, particularly given the results of Mashayek, Caulfield & Peltier (Reference Mashayek, Caulfield and Peltier2013) highlighting strong Richardson number dependence within that simple set-up. A future study of shear-induced mixing for a wider range of background buoyancy profiles would be useful in pinpointing the key parameters governing variations in mixing efficiency. Nevertheless in the simulations with larger initial wave steepness, mixing appears predominantly shear-driven despite the prior emergence of convective rolls in the breakdown of the wave. To be clear, by ‘shear driven’ we mean that energy is supplied to turbulence primarily through shear instabilities, and in this case the unstable shear is that in the velocity field of the refracted internal wave.

As seen in figure 10, the less energetic initial conditions do not lead to as much turbulent activity. The waves are still refracted towards the centre of the domain and reach sufficient steepness to drive local convection, but we do not observe as intense shear amplification in the $xz$-plane in these cases. We suggest that viscous effects are damping the wave before strong shears can be generated. Although high wave steepness values occur at later times, high local wavenumbers are still produced earlier by the wave refraction, and as time progresses these gradients will be smeared out by diffusion. A simple model for this wave damping is provided in Appendix A, although its inherently linear formulation prevents us from drawing quantitative comparisons with the simulations presented here.

Even at the high resolution of our simulations, we cannot consider Reynolds numbers that match our motivating oceanographic observations, suggesting that viscous effects are overemphasised in our flows. It is therefore possible that the mechanisms driving turbulence and mixing in our more energetic simulations may be relevant for flows arising from smaller initial wave steepness. In these cases unstable shear layers would be produced at higher wavenumbers, potentially limiting the size of the billows and the extent of the turbulence. Nevertheless this wave breaking may be representative of a process leading to intense mixing from internal waves in the ocean.

Shear-driven turbulence is commonly associated with an extraction of energy from the mean shear flow, characterised by positive values of shear production $S_p>0$. However, even in our most energetic simulations, we find on average that $S_p<0$ in the region of most intense turbulence, as shown in figure 11. This instead suggests that the primary effect on the mean flow comes from a wave–mean flow interaction, where the wave transfers its momentum into the mean flow as it breaks. The change in the mean streamwise velocity shown in figure 5(*a*) supports this interpretation. Indeed, since the strong local shears are associated with the wave rather than the mean flow, it may be expected that simple energetics arguments regarding the interaction of turbulence and a mean flow do not apply here.

Although this counterexample to the traditional picture of shear-driven turbulence is specific to our set-up, it highlights a generic difficulty in analysing turbulent stratified flows. The effects of internal gravity waves and turbulence are often considered in isolation, although their interplay is vital at the scales associated with wave breaking that are of interest to us. Waves break to produce turbulence, turbulence itself can emit internal waves, and the evolution of a turbulent patch in a stratified fluid is affected at leading order by the presence of internal waves (as reviewed e.g. by Davidson Reference Davidson2013). Continuous energy transfer between waves and turbulence can lead to great difficulties in interpreting their respective roles in the dynamics.

In our simulations, the internal wave appears to drive both the generation of turbulence and the modification of the mean flow. However, our set-up of an initial value problem superimposing a wave and shear is not typical of how such an interaction would arise in the ocean. Internal waves in the ocean continuously propagate away from generation sites such as topographic features where waves are generated through tidal flows (Sarkar & Scotti Reference Sarkar and Scotti2017). Future studies could extend the relevant set-up of Lamb & Dunphy (Reference Lamb and Dunphy2018), who consider the interaction of a tidal flow over a ridge with a mean shear, but only in two dimensions. It is unclear what behaviour could be expected over a longer time scale as more waves propagate towards the breaking event through the shear. If a critical level were responsible for the breakdown, one might expect a continuous supply of energy to maintain the turbulence as waves propagate towards it.

Our simulation of an isolated ‘burst’ of turbulence arising from a large amplitude internal wave is, however, more consistent with the time scales of overturning events observed by $\text{{AP}}$. Taking the dimensionless duration of the wave breaking event in simulation R8s1 as $N_0 t = 50$ and the background buoyancy frequency as $N_0=5\times 10^{-3}\ \textrm {s}^{-1}$, we deduce an event duration of $T= 1\times 10^{4}\ \textrm {s}\approx 0.116\ \textrm {d}$, consistent with the time scales shown in figure 11 of $\text{{AP}}$. Of course those observations rely on individual vertical profiles, and it is possible for longer lasting turbulent patches to simply be advected away from the profiler.

Although not present in the observations of $\text{{AP}}$, the existence of coherent ‘staircases’ in density is common in many regions of the ocean. The propagation and instability of internal waves in such regions, where the background stratification varies strongly, is far different to the case of uniform stratification (Sutherland Reference Sutherland2016). Nevertheless, the fundamental mechanism of shear refracting small-scale internal waves seems relevant at sharp density interfaces, at least in situations with large internal solitary waves as considered by Xu & Stastna (Reference Xu and Stastna2018). Understanding how generic the mixing properties of this shear–wave interaction are for arbitrary $N^{2}(z)$ is vital for the general application of our results.

In the context of the ocean thermocline, we have also neglected the effect of the Earth's rotation in our simulations. For the field site of $\text{{AP}}$, buoyancy effects are important on much faster time scales than rotation, as evidenced by the typical ratio $f/N=1.6\times 10^{-3}$. The slowly varying shear may, however, be intrinsically modified by rotation, and it is most likely associated with a slowly propagating near-inertial wave. Although the observations of $\text{{AP}}$ tell us the strength of the vertical shear, they do not report on the orientation of the mean flow or how it changes. This orientation may have significant consequences on the nature of the wave breakdown. For example Fritts *et al.* (Reference Fritts, Wang and Werne2013) find that a spiralling fine-scale shear flow weakens the spanwise convective instability relative to the case of a shear flow aligned with the internal wave. Broutman *et al.* (Reference Broutman, Macaskill, McIntyre and Rottman1997) also add the time-dependent nature of propagating near-inertial shear to their ray-tracing analysis and find that this can reduce the proportion of short internal waves that end up dissipated in critical layers. Determining whether these types of interaction could impact our results on mixing and mean flow acceleration would be useful in understanding how specific the results are to our set-up.

In regions away from the thermocline, $f/N$ typically takes larger values and rotation can be expected to play more of an important role, although similar wave breaking mechanisms may still be relevant. For example the deep ocean measurements of Waterman, Naveira Garabato & Polzin (Reference Waterman, Naveira Garabato and Polzin2012) highlight a local peak in turbulent dissipation and internal wave energy approximately 1 km above the ocean floor, where stratification remains relatively weak. From corresponding measurements of the mean shear flow, they attribute this peak to waves breaking at critical levels. Waterman *et al.* (Reference Waterman, Naveira Garabato and Polzin2012) also find a mismatch in this region between dissipation rates measured from microstructure and those inferred from the internal wave energy. One explanation for this is that, like in our simulations, wave energy is split between the mean flow and turbulence as the waves break. Investigating how incoming wave energy is distributed between mean flow acceleration, turbulent dissipation and mixing in a fully turbulent critical layer would be useful for improving parameterisations for such scenarios. Such parameterisations could depend strongly on the properties of the incoming waves, and therefore require a fundamental understanding of the various sources of internal waves in the ocean. A key open question remains of how much mixing can be attributed to each of these sources, such as tidal beams (Dauxois *et al.* Reference Dauxois, Joubaud, Odier and Venaille2018), lee waves (Legg Reference Legg2021) and near-inertial waves (Alford *et al.* Reference Alford, MacKinnon, Simmons and Nash2016).

## Supplementary material and movies

Supplementary material and movies are available at https://doi.org/10.1017/jfm.2021.506.

## Acknowledgements

We thank M. Stastna for a detailed and constructive review of this paper. We are also grateful for the comments of A. Mashayek and P. Haynes, who examined the thesis chapter on which this manuscript is based.

## Fundig

This work was supported by the Natural Environment Research Council through the Cambridge Earth System Science DTP (grant number NE/L002507/1). This work was performed using resources provided by the Cambridge Service for Data Driven Discovery (CSD3) operated by the University of Cambridge Research Computing Service (www.csd3.cam.ac.uk), provided by Dell EMC and Intel using Tier-2 funding from the Engineering and Physical Sciences Research Council (capital grant EP/P020259/1), and DiRAC funding from the Science and Technology Facilities Council (www.dirac.ac.uk).

## Declaration of interests

The authors report no conflict of interest.

## Appendix A. Wave action and laminar diffusion for linear internal waves

As noted in the main text, we have utilised linear ray tracing to gain some qualitative insight into the interaction between the (finite-amplitude) internal gravity waves and the background shear flow we have simulated. It is always important to remember that a key assumption in this analysis is that the mean flow varies on a much larger scale than the wave. In our set-up of (2.9)–(2.11) the vertical wavelength of the mean shear is only three times that of the internal wave, so the analysis presented in § 2.2 and here cannot (of course) be expected to describe the dynamics quantitatively. Furthermore the large values of wave steepness we consider ($s\geqslant 0.5$) break the underlying linear assumption at the heart of the theory. Nevertheless, perhaps surprisingly, valuable qualitative insight can still be gained from a linear ray-tracing analysis. In this Appendix we further extend this analysis by calculating the (predicted) modification of a linear (i.e. infinitesimal-amplitude) internal gravity wave in a ‘slowly varying’ shear flow due to laminar diffusion.

When subjected to a mean flow, internal waves do not conserve energy as they propagate along a ray. In the linear framework considered in § 2.2, another quantity known as wave action instead satisfies a conservation equation. We define wave action as

*a*,

*b*)\begin{equation} \mathscr{A}=\frac{\bar{E}}{\hat{\omega}}, \quad E = \frac{1}{2}|\boldsymbol{u}^{\prime}|^{2} + \frac{Ri_0}{2}|\theta|^{2} , \end{equation}

where $\bar {E}$ is the horizontally averaged energy of the wave. The conservation equation for wave action can be derived from the linearised momentum equation (as first shown by Bretherton & Garrett Reference Bretherton and Garrett1968) and takes the form

We can now combine this conservation equation with the ray equations of (2.16*a*,*b*) to give a system of three ordinary differential equations that describe the evolution of the path and amplitude of the internal wave. Recall that in (2.16*a*,*b*), the time derivative is defined as $\textrm {d}/\textrm {d}t=\partial /\partial t + \boldsymbol {c_g}\boldsymbol {\cdot }\boldsymbol {\nabla }$, so we can rewrite (A2) as

An analytic expression for the vertical derivative of the group velocity can also be obtained by expressing $c_{g,z}$ as a function of $m$ and using the chain rule, namely

Here, $m(z)$ can be inferred from the dispersion relation (2.15) as

We now have a closed system to solve numerically for initial values of $x$, $z$, $k$, $m$ and $\mathscr {A}$.

To investigate how the waves behave as they are refracted towards the middle of the domain, we now also consider the evolution of the wave action along the rays. As described previously, the vertical wavenumber $m$ increases as a wave approaches a critical level. This means that molecular diffusion, thus far neglected in the analysis, may become important, particularly for the Reynolds numbers of our direct numerical simulations. We therefore propose a simple modification to the ray tracing equations that incorporates diffusive effects below.

Consistent with the assumption that $m$ is larger than the vertical wavenumber of the shear, we only consider diffusion associated with the internal wave, and assume that the mean shear flow $\bar {u}(z)$ is constant in time. Defining the wave energy density $E$ as in (A1*a*,*b*), diffusive effects will appear in the energy equation as a dissipation rate $\mathscr {D}$

For $Pr=1$, if we substitute the internal gravity form of (2.7*a*–*c*) (where $\omega$ in the velocity pre-factors should be replaced with the intrinsic frequency $\hat {\omega }$) then the dissipation term simplifies to $\mathscr {D}=2(k^{2}+m^{2})E/Re$. By doing this, we assume that the polarisation of the velocity and buoyancy field in (2.7*a*–*c*) is maintained even as the vertical wavenumber varies due to refraction. Dividing $\mathcal {D}$ by $\hat {\omega }$ then gives the corresponding dissipation rate to add to the wave action equation, which becomes

This equation can be solved in conjunction with the ray-tracing equations of (2.16*a*,*b*) to provide an estimate of the energy buildup in the centre of the domain.

Although now straightforward to calculate, wave action can be difficult to interpret intuitively. In particular, it is not clear what a specific value of $\mathscr {A}$ can tell us about how susceptible a wave is to different instabilities. Stability analyses of finite-amplitude internal waves have shown that the local wave steepness $s$ is a key parameter in determining the nature of wave breakdown (e.g. Lombard & Riley Reference Lombard and Riley1996). We therefore convert wave action to wave steepness by assuming the wave locally maintains the polarisation given in (2.7*a*–*c*), even as the local wave vector is modified by the Doppler shifting. In this form, the energy density of the wave is simply given by $E= s^{2}/2m$. Wave steepness and wave action can then be exchanged through the equations

*a*,

*b*)\begin{equation} \mathscr{A}(z) = \frac{{s(z)}^{2} \sqrt{{k_0}^{2} + {m(z)}^{2}}}{2{k_0}{m(z)}^{2}}, \quad s(z) = \sqrt{\frac{2\mathscr{A}(z) {k_0}{m(z)}^{2}}{\sqrt{{k_0}^{2} + {m(z)}^{2}}}}. \end{equation}

Figure 12 presents the results of solving (A7) in terms of the wave steepness obtained through (A8*a*,*b*) for a range of initial wavepacket heights $z_0$. The initial wave steepness is set at $s=0.5$, and we compare the results for the inviscid limit in figure 12(*a*) with the results for $Re=5000$ in figure 12(*b*). In the inviscid case, $s$ increases consistently over time for those rays that approach a critical level. The high values of $s$ seen in figure 12(*a*) predict the development of highly unstable convective regions in the centre of the domain. However, once diffusion is taken into account, wave steepness is shown to peak on a time scale of $O(50)$ and then decrease as the critical levels are approached. This time scale is comparable with the time at which spanwise perturbations peak in the simulations with $s<1$, shown in figure 6. It is plausible that the wave breakdown in these cases may be affected by diffusive effects. This diffusion may also lead to the lower growth rates seen in figure 6 for $s<1$, since reduced values of local steepness produce smaller negative buoyancy gradients to drive convective instabilities.