Skip to main content Accessibility help
×
Home
Hostname: page-component-55b6f6c457-pc5cw Total loading time: 2.11 Render date: 2021-09-27T11:30:24.859Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "metricsAbstractViews": false, "figures": true, "newCiteModal": false, "newCitedByModal": true, "newEcommerce": true, "newUsageEvents": true }

On the derivation of the wave kinetic equation for NLS

Published online by Cambridge University Press:  23 July 2021

Yu Deng
Affiliation:
Department of Mathematics, University of Southern California, Los Angeles, CA 90089, USA; E-mail: yudeng@usc.edu
Zaher Hani
Affiliation:
Department of Mathematics, University of Michigan, Ann Arbor, MI 48109, USA; E-mail: zhani@umich.edu

Abstract

A fundamental question in wave turbulence theory is to understand how the wave kinetic equation describes the long-time dynamics of its associated nonlinear dispersive equation. Formal derivations in the physics literature, dating back to the work of Peierls in 1928, suggest that such a kinetic description should hold (for well-prepared random data) at a large kinetic time scale $T_{\mathrm {kin}} \gg 1$ and in a limiting regime where the size L of the domain goes to infinity and the strength $\alpha $ of the nonlinearity goes to $0$ (weak nonlinearity). For the cubic nonlinear Schrödinger equation, $T_{\mathrm {kin}}=O\left (\alpha ^{-2}\right )$ and $\alpha $ is related to the conserved mass $\lambda $ of the solution via $\alpha =\lambda ^2 L^{-d}$ .

In this paper, we study the rigorous justification of this monumental statement and show that the answer seems to depend on the particular scaling law in which the $(\alpha , L)$ limit is taken, in a spirit similar to how the Boltzmann–Grad scaling law is imposed in the derivation of Boltzmann’s equation. In particular, there appear to be two favourable scaling laws: when $\alpha $ approaches $0$ like $L^{-\varepsilon +}$ or like $L^{-1-\frac {\varepsilon }{2}+}$ (for arbitrary small $\varepsilon $ ), we exhibit the wave kinetic equation up to time scales $O(T_{\mathrm {kin}}L^{-\varepsilon })$ , by showing that the relevant Feynman-diagram expansions converge absolutely (as a sum over paired trees). For the other scaling laws, we justify the onset of the kinetic description at time scales $T_*\ll T_{\mathrm {kin}}$ and identify specific interactions that become very large for times beyond $T_*$ . In particular, the relevant tree expansion diverges absolutely there. In light of those interactions, extending the kinetic description beyond $T_*$ toward $T_{\mathrm {kin}}$ for such scaling laws seems to require new methods and ideas.

Type
Analysis
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted re-use, distribution, and reproduction in any medium, provided the original work is properly cited.
Copyright
© The Author(s), 2021. Published by Cambridge University Press

1 Introduction

The kinetic framework is a general paradigm that aims to extend Boltzmann’s kinetic theory for dilute gases to other types of microscopic interacting systems. This approach has been highly informative, and became a cornerstone of the theory of nonequilibrium statistical mechanics for a large body of systems [Reference Spohn43Reference Spohn44]. In the context of nonlinear dispersive waves, this framework was initiated in the first half of the past century [Reference Peierls41] and developed into what is now called wave turbulence theory [Reference Zakharov, L’vov and Falkovich51Reference Nazarenko39]. There, waves of different frequencies interact nonlinearly at the microscopic level, and the goal is to extract an effective macroscopic picture of how the energy densities of the system evolve.

The description of such an effective evolution comes via the wave kinetic equation (WKE), which is the analogue of Boltzmann’s equation for nonlinear wave systems [Reference Spohn46]. Such kinetic equations have been derived at a formal level for many systems of physical interest (nonlinear Schrödinger (NLS) and nonlinear wave (NLW) equations, water waves, plasma models, lattice crystal dynamics, etc.; compare [Reference Nazarenko39] for a textbook treatment) and are used extensively in applications (thermal conductivity in crystals [Reference Spohn45], ocean forecasting [Reference Janssen31Reference Burns49], and more). This kinetic description is conjectured to appear in the limit where the number of (locally interacting) waves goes to infinity and an appropriate measure of the interaction strength goes to zero (weak nonlinearityFootnote 1 ). In such kinetic limits, the total energy of the whole system often diverges.

The fundamental mathematical question here, which also has direct consequences for the physical theory, is to provide a rigorous justification of such wave kinetic equations starting from the microscopic dynamics given by the nonlinear dispersive model at hand. The importance of such an endeavour stems from the fact that it allows an understanding of the exact regimes and the limitations of the kinetic theory, which has long been a matter of scientific interest (see [Reference Denissenko, Lukaschuk and Nazarenko20Reference Aubourg, Campagne, Peureux, Ardhuin, Sommeria, Viboud and Mordant1]). A few mathematical investigations have recently been devoted to studying problems in this spirit [Reference Faou23Reference Buckmaster, Germain, Hani and Shatah7Reference Lukkarinen and Spohn35], yielding some partial results and useful insights.

This manuscript continues the investigation initiated in [Reference Buckmaster, Germain, Hani and Shatah7], aimed at providing a rigorous justification of the wave kinetic equation corresponding to the nonlinear Schrödinger equation,

$$ \begin{align*} i \partial_t v - \Delta v + \left\lvert v\right\rvert^{2} v=0. \end{align*} $$

As we shall explain later, the sign of the nonlinearity has no effect on the kinetic description, so we choose the defocussing sign for concreteness. The natural setup for the problem is to start with a spatial domain given by a torus ${\mathbb T}^d_L$ of size L, which approaches infinity in the thermodynamic limit we seek. This torus can be rational or irrational, which amounts to rescaling the Laplacian into

$$ \begin{align*} \Delta_{\beta} \mathrel{\mathop:}= \frac{1}{2\pi} \sum\limits_{i=1}^d \beta_i \partial_i^2, \qquad \beta \mathrel{\mathop:}= (\beta_1,\dots,\beta_d)\in [1,2]^d, \end{align*} $$

and taking the spatial domain to be the standard torus of size L, namely ${\mathbb T}^d_L=[0,L]^d$ with periodic boundary conditions. With this normalisation, an irrational torus would correspond to taking the $\beta _j$ to be rationally independent. Our results cover both cases, and in part of them $\beta $ is assumed to be generic – that is, avoiding a set of Lebesgue measure $0$ .

The strength of the nonlinearity is related to the characteristic size $\lambda $ of the initial data (say in the conserved $L^2$ space). Adopting the ansatz $v=\lambda u$ , we arrive at the following equation:

(NLS) $$ \begin{align} \begin{cases} i \partial_t u - \Delta_{\beta} u + \lambda^{2} \left\lvert u\right\rvert^{2} u=0, \quad x\in \mathbb{T}_L^d = [0,L]^d, \\ u(0,x) = u_{\text{in}}(x). \end{cases} \end{align} $$

The kinetic description of the long-time behaviour is akin to a law of large numbers, and therefore one has to start with a random distribution of the initial data. Heuristically, a randomly distributed, $L^{2}$ -normalised field would (with high probability) have a roughly uniform spatial distribution, and consequently an $L_x^{\infty }$ norm $\sim L^{-d/2}$ . This makes the strength of the nonlinearity in (NLS) comparable to $\lambda ^2 L^{-d}$ (at least initiallyFootnote 2 ), which motivates us to introduce the quantity

$$ \begin{align*} \alpha=\lambda^2L^{-d} \end{align*} $$

and phrase the results in terms of $\alpha $ instead of $\lambda $ . The kinetic conjecture states that at sufficiently long time scales, the effective dynamics of the Fourier-space mass density $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2 \left (k \in \mathbb Z^d_L=L^{-1}\mathbb Z^d\right )$ is well approximated – in the limit of large L and vanishing $\alpha $ – by an appropriately scaled solution $n(t, \xi )$ of the following WKE:

(WKE) $$ \begin{align} \partial_t n(t, \xi) &={\mathcal K}\left(n(t, \cdot)\right), \nonumber \\ {\mathcal K}(\phi)(\xi)&:= \int_{\substack{\left(\xi_1, \xi_2, \xi_3\right)\in {\mathbb R}^{3d} \\ \xi_1-\xi_2+\xi_3=\xi}} \phi \phi_1 \phi_2 \phi_3\left(\frac{1}{\phi}-\frac{1}{\phi_1}+\frac{1}{\phi_2}-\frac{1}{\phi_3}\right)\delta_{{\mathbb R}}\left(\left\lvert\xi_1\right\rvert_{\beta}^2-\left\lvert\xi_2\right\rvert_{\beta}^2+\left\lvert\xi_3\right\rvert_{\beta}^2-\left\lvert\xi\right\rvert_{\beta}^2\right) d\xi_1 d\xi_2 d\xi_3, \end{align} $$

where we used the shorthand notations $\phi _j:=\phi \left (\xi _j\right )$ and $\left \lvert \xi \right \rvert ^2_{\beta }=\sum _{j=1}^d \beta _j \left (\xi ^{\left (j\right )}\right )^2$ for $\xi =\left (\xi ^{(1)},\cdots ,\xi ^{(d)}\right )$ . More precisely, one expects this approximation to hold at the kinetic timescale $T_{\mathrm {kin}}\sim \alpha ^{-2}=\frac {L^{2d}}{\lambda ^4}$ , in the sense that

(1.1) $$ \begin{align} \mathbb E \left\lvert\widehat u(t, k)\right\rvert^2 \approx n\left(\frac{t}{T_{\mathrm{kin}}}, k\right) \quad \text{as } L\to \infty, \alpha \to 0. \end{align} $$

Of course, for such an approximation to hold at time $t=0$ , one has to start with a well-prepared initial distribution for $\widehat u_{\text {in}}(k)$ as follows: denoting by $n_{\text {in}}$ the initial data for (WKE), we assume

(1.2) $$ \begin{align} \widehat u_{\mathrm{in}}(k)=\sqrt{n_{\text{in}}(k)} \eta_{k}(\omega), \end{align} $$

where $\eta _{k}(\omega )$ are mean- $0$ complex random variables satisfying $\mathbb E \left \lvert \eta _k\right \rvert ^2=1$ . In what follows, $\eta _k(\omega )$ will be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either the normalised complex Gaussian or the uniform distribution on the unit circle $\lvert z\rvert =1$ .

Before stating our results, it is worth remarking on the regime of data and solutions covered by this kinetic picture in comparison to previously studied and well-understood regimes in the nonlinear dispersive literature. For this, let us look back at the (pre-ansatz) NLS solution v, whose conserved energy is given by

$$ \begin{align*} {\mathcal E}[v]:=\int_{{\mathbb T}^d_L} \frac{1}{2}\left\lvert\nabla v\right\rvert^2 +\frac{1}{4}\left\lvert v\right\rvert^4 \mathrm{d}x. \end{align*} $$

We are dealing with solutions having an $L^{\infty }$ -norm of $O\left (\sqrt \alpha \right )$ (with high probability) and whose total mass is $O\left (\alpha L^d\right )$ , in a regime where $\alpha $ is vanishingly small and L is asymptotically large. These bounds on the solutions are true initially, as we have already explained, and will be propagated in our proof. In particular, the mass and energy are very large and will diverge in this kinetic limit, as is common in taking thermodynamic limits [Reference Ruelle42Reference Minlos37]. Moreover, the potential part of the energy is dominated by the kinetic part – the former of size $O\left (\alpha ^3 L^d\right )$ and the latter of size $O\left (\alpha L^d\right )$ – which explains why there is no distinction between the defocussing and focussing nonlinearities in the kinetic limit. It would be interesting to see how the kinetic framework can be extended to regimes of solutions which are sensitive to the sign of the nonlinearity; this has been investigated in the physics literature (e.g., [Reference Dyachenko, Newell, Pushkarev and Zakharov22Reference Fitzmaurice, Gurarie, McCaughan and Woyczynski25Reference Zakharov, Korotkevich, Pushkarev and Resio50]).

1.1 Statement of the results

It is not a priori clear how the limits $L\to \infty $ and $\alpha \to 0$ need to be taken for formula (1.1) to hold or whether there is an additional scaling law (between $\alpha $ and L) that needs to be satisfied in the limit. In comparison, such scaling laws are imposed in the rigorous derivation of Boltzmann’s equation [Reference Lanford34Reference Cercignani, Illner and Pulvirenti10Reference Gallagher, Saint-Raymond and Texier26], which is derived in the so-called Boltzmann–Grad limit [Reference Grad27]: namely, the number N of particles goes to $\infty $ while their radius r goes to $0$ in such a way that $Nr^{d-1}\sim O(1)$ . To the best of our knowledge, this central point has not been adequately addressed in the wave-turbulence literature.

Our results seem to suggest some key differences depending on the chosen scaling law. Roughly speaking, we identify two special scaling laws for which we are able to justify the approximation (1.1) up to time scales $L^{-\varepsilon } T_{\text {kin}}$ for any arbitrarily small $\varepsilon>0$ . For other scaling laws, we identify significant absolute divergences in the power-series expansion for $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2$ at much earlier times. We can therefore only justify this approximation at such shorter times (which are still better than those in [Reference Buckmaster, Germain, Hani and Shatah7]). In these cases, whether or not formula (1.1) holds up to time scales $L^{-\varepsilon } T_{\text {kin}}$ depends on whether such series converge conditionally instead of absolutely, and thus would require new methods and ideas, as we explain later.

We start by identifying the two favourable scaling laws. We use the notation $\sigma +$ for any numerical constant $\sigma $ (e.g., $\sigma =-\varepsilon $ or $\sigma =-1-\frac {\varepsilon }{2}$ , where $\varepsilon $ is as in Theorem 1.1) to denote a constant that is strictly larger than and sufficiently close to $\sigma $ .

Theorem 1.1. Set $d\geq 2$ and let $\beta \in [1,2]^d$ be arbitrary. Suppose that $n_{\mathrm {in}} \in {\mathcal S}\left ({\mathbb R}^d \to [0, \infty )\right )$ is SchwartzFootnote 3 and $\eta _{k}(\omega )$ are independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either complex Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle $\lvert z\rvert =1$ . Assume well-prepared initial data $u_{\mathrm {in}}$ for (NLS) as in equation (1.2).

Fix $0<\varepsilon <1$ (in most interesting cases $\varepsilon $ will be small); recall that $\lambda $ and L are the parameters in (NLS) and let $\alpha =\lambda ^2L^{-d}$ be the characteristic strength of the nonlinearity. If $\alpha $ has the scaling law $\alpha \sim L^{(-\varepsilon )+}$ or $\alpha \sim L^{\left (-1-\frac {\varepsilon }{2}\right )+}$ , then we have

(1.3) $$ \begin{align} \mathbb E \left\lvert\widehat u(t, k)\right\rvert^2 =n_{\mathrm{in}}(k)+\frac{t}{T_{\mathrm{kin}}}{\mathcal K}(n_{\mathrm{in}})(k)+o_{\ell^{\infty}_k}\left(\frac{t}{T_{\mathrm {kin}}}\right)_{L \to \infty} \end{align} $$

for all $L^{0+} \leq t \leq L^{-\varepsilon } T_{\mathrm {kin}}$ , where $T_{\mathrm {kin}}=\alpha ^{-2}/2$ , ${\mathcal K}$ is defined in (WKE) and $o_{\ell ^{\infty }_k}\left (\frac {t}{T_{\mathrm {kin}}}\right )_{L \to \infty }$ is a quantity that is bounded in $\ell ^{\infty }_k$ by $L^{-\theta } \frac {t}{T_{\mathrm {kin}}}$ for some $\theta>0$ .

We remark that in the time interval of the approximation we have been discussing, the right hand sides of formulas (1.1) and (1.3) are equivalent. Also note that any type of scaling law of the form $\alpha \sim L^{-s}$ gives an upper bound of $t\leq L^{-\varepsilon }T_{\mathrm {kin}}\sim L^{2s-\varepsilon }$ for the times considered. Consequently, for the two scaling laws in Theorem 1.1, the time t always satisfies $t\ll L^{2}$ , and it is for this reason that the rationality type of the torus is not relevant. As will be clear later, no similar results can hold for $t\gg L^2$ in the case of a rational torus,Footnote 4 as this would require rational quadratic forms to be equidistributed on scales $\ll 1$ , which is impossible. However, if the aspect ratios $\beta $ are assumed to be generically irrational, then one can access equidistribution scales that are as small as $L^{-d+1}$ for the resulting irrational quadratic forms [Reference Bourgain4Reference Buckmaster, Germain, Hani and Shatah7]. This allows us to consider scaling laws for which $T_{\mathrm {kin}}$ can be as big as $L^{d-}$ on generically irrational tori.

Remark 1.2. Strictly speaking, in evaluating equation (1.3) one has to first ensure the existence of the solution u. This is guaranteed if $d\in \{2,3,4\}$ (when (NLS) is $H^1$ -critical or subcritical). When $d\geq 5$ we shall interpret equation (1.3) such that the expectation is taken only when the long-time smooth solution u exists. Moreover, from our proof it follows that the probability that this existence fails is at most $O\left (e^{-L^{\theta }}\right )$ , which quickly becomes negligible when $L\to \infty $ .

The following theorem covers general scaling laws, including the ones that can only be accessed for the generically irrational torus. By a simple calculation of exponents, we can see that it implies Theorem 1.1.

Theorem 1.3. With the same assumptions as in Theorem 1.1, we impose the following conditions on $(\alpha , L, T)$ for some $\delta>0$ :

(1.4) $$ \begin{align}T\leq \begin{cases}L^{2-\delta}&\text{if }\beta_i\text{ is arbitrary},\\ L^{d-\delta}&\text{if }\beta_i\text{ is generic}, \end{cases} \qquad\alpha\leq \begin{cases} L^{-\delta}T^{-1}&\text{if }T \leq L,\\ L^{-1-\delta}&\text{if }L\leq T\leq L^2,\\ L^{1-\delta}T^{-1}&\text{if } T\geq L^2. \end{cases} \end{align} $$

Then formula (1.3) holds for all $L^{\delta } \leq t \leq T$ .

It is best to read this theorem in terms of the $\left (\log _L \left (\alpha ^{-1}\right ),\log _L T\right )$ plot in Figure 1. The kinetic conjecture corresponds to justifying the approximation in formula (1.1) up to time scales $T\lesssim T_{\mathrm {kin}}=\alpha ^{-2}$ . As we shall explain later, the time scale $T\sim T_{\mathrm {kin}}$ represents a critical scale for the problem from a probabilistic point of view. This is depicted by the red line in the figure, and the region below this line corresponds to a probabilistically subcritical regime (see Section 1.2.1). The shaded blue region corresponds to the $(\alpha , T)$ region in Theorem 1.3, neglecting $\delta $ losses. This region touches the line $T=\alpha ^{-2}$ at the two points corresponding to $\left (\alpha ^{-1}, T\right )=(1, 1)$ and $\left (L, L^2\right )$ , whereas the two scaling laws of Theorem 1.1, where $\left (\alpha ^{-1},T\right )\sim (L^{\varepsilon -},L^{\varepsilon -})$ and $\left (\alpha ^{-1},T\right )\sim \left (L^{1+\frac {\varepsilon }{2}-},L^{2-}\right )$ , approach these two points when $\varepsilon $ is small.

Figure 1 Admissible range for $(\alpha , L, T)$ in the $\left (\log _L \left (\alpha ^{-1}\right ),\log _L T\right )$ plot when $d\geq 3$ . The coloured region is the range of Theorem 1.3 (up to $\varepsilon $ endpoint accuracy). The red line denotes the case when $T=T_{\mathrm {kin}}=\alpha ^{-2}$ , which our coloured region touches at two points corresponding to $T\sim 1$ and $T\sim L^{2}$ .

These results rely on a diagrammatic expansion of the NLS solution in Feynman diagrams akin to a Taylor expansion. The shaded blue region depicting the result of Theorem 1.3 corresponds to the cases when such a diagrammatic expansion is absolutely convergent for very large L. In the complementary region between the blue region and the line $T=T_{\text {kin}}$ , we show that some (arbitrarily high-degree) terms of this expansion do not converge to $0$ as their degree goes to $\infty $ , which means that the diagrammatic expansion cannot converge absolutely in this region. Therefore, the only way for the kinetic conjecture to be true in the scaling regimes not included in Theorem 1.1 is for those terms to exhibit a highly nontrivial cancellation, which would make the series converge conditionally but not absolutely.

Finally, we remark on the restriction in formula (1.4). The upper bounds on T on the left are necessary from number-theoretic considerations: indeed, if $T\gg L^2$ for a rational torus, or if $T\gg L^d$ for an irrational one, the exact resonances of the NLS equation dominate the quasi-resonant interactions that lead to the kinetic wave equation. One should therefore not expect the kinetic description to hold in those ranges of T (see Lemma 3.2 and Section 4). The second set of restrictions in formula (1.4) correspond exactly to the requirement that the size of the Feynman diagrams of degree n can be bounded by $\rho ^n$ with some $\rho \ll 1$ . In fact, if one aims only at proving existence with high probability (not caring about the asymptotics of $\mathbb {E}\left \lvert \widehat {u}(t,k)\right \rvert ^2$ ), then the restrictions on the left of formula (1.4) will not be necessary, and one obtains control for longer times. See also the following remark:

Remark 1.4 Admissible scaling laws

The foregoing restrictions on T impose the limits of the admissible scaling laws, in which $\alpha \to 0$ and $L \to \infty $ , for which the kinetic description of the long-time dynamics can appear. Indeed, since $T_{\mathrm {kin}}=\alpha ^{-2}$ , then the necessary (up to $L^{\delta }$ factors) restrictions $T\ll L^{2-\delta }$ (resp., $T\ll L^{d-\delta }$ ) on the rational (resp., irrational) torus already mentioned imply that one should only expect the previous kinetic description in the regime where $\alpha \gtrsim L^{-1}$ (resp., $\gtrsim L^{-d/2}$ ). In other words, the kinetic description requires the nonlinearity to be weak, but not too weak! In the complementary regime of very weak nonlinearity, the exact resonances of the equation dominate the quasi-resonances – a regime referred to as discrete wave turbulence (see [Reference L’vov and Nazarenko36Reference Kartashova32Reference Nazarenko39]), in which different effective equations, like the (CR) equation in [Reference Faou, Germain and Hani24Reference Buckmaster, Germain, Hani and Shatah6], can arise.

1.2 Ideas of the proof

As Theorem 1.1 is a consequence of Theorem 1.3, we will focus on Theorem 1.3. The proof of Theorem 1.3 contains three components: ( $1$ ) a long-time well-posedness result, where we expand the solution to (NLS) into Feynman diagrams for sufficiently long time, up to a well-controlled error term; ( $2$ ) computation of $\mathbb E\left \lvert \widehat u_k(t)\right \rvert ^2 \left (k \in \mathbb Z^d_L\right )$ using this expansion, where we identify the leading terms and control the remainders; and ( $3$ ) a number-theoretic result that justifies the large box approximation, where we pass from the sums appearing in the expansion in the previous component to the integral appearing on the right-hand side of (WKE).

The main novelty of this work is in the first component, which is the hardest. The second component follows similar lines to those in [Reference Buckmaster, Germain, Hani and Shatah7]. Regarding the third component, the main novelty of this work is to complement the number-theoretic results in [Reference Buckmaster, Germain, Hani and Shatah7] (which dealt only with the generically irrational torus) by the cases of general tori (in the admissible range of time $T\ll L^2$ ). This provides an essentially full (up to $L^{\varepsilon }$ losses) understanding of the number-theoretic issues arising in wave-turbulence derivations for (NLS). Therefore, we will limit this introductory discussion to the first component.

1.2.1 The scheme and probabilistic criticality

Though technically involved, the basic idea of the long-time well-posedness argument is in fact quite simple. Starting from (NLS) with initial data of equation (1.2), we write the solution as

(1.5) $$ \begin{align} u=u^{(0)}+\cdots+u^{(N)}+\mathcal R_{N+1}, \end{align} $$

where $u^{(0)}=e^{-it\Delta _{\beta }}u_{\mathrm {in}}$ is the linear evolution, $u^{(n)}$ are iterated self-interactions of the linear solution $u^{(0)}$ that appear in a formal expansion of u and $\mathcal R_{N+1}$ is a sufficiently regular remainder term.

Since $u^{(0)}$ is a linear combination of independent random variables, and each $u^{(n)}$ is a multilinear combination, each of them will behave strictly better (both linearly and nonlinearly) than its deterministic analogue (i.e., with all $\eta _k=1$ ). This is due to the well-known large deviation estimates, which yield a ‘square root’ gain coming from randomness, akin to the central limit theorem (for instance, $\left \lVert u_{\mathrm {in}}\right \rVert _{L^{\infty }}$ is bounded by $L^{-d/2}\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ in the probabilistic setting, as opposed to $1\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ deterministically by Sobolev embedding, assuming compact Fourier support). This gain leads to a new notion of criticality for the problem, which can be definedFootnote 5 as the edge of the regime of $(\alpha , T)$ for which the iterate $u^{(1)}$ is better bounded than the iterate $u^{(0)}$ . It is not hard to see that $u^{(1)}$ can have size up to $O(\alpha\sqrt{T})$ (in appropriate norms), compared to the $O(1)$ size of $u^{(0)}$ (see, e.g., formula (2.25) for $n=1$ ). This justifies the notion that $T\sim T_{\mathrm {kin}}=\alpha ^{-2}$ corresponds to probabilistically critical scaling, whereas the time scales $T\ll T_{\mathrm {kin}}$ are subcritical.Footnote 6

As it happens, a certain notion of criticality might not capture all the subtleties of the problem. As we shall see, some higher-order iterates $u^{(n)}$ will not be better bounded than $u^{(n-1)}$ in the full subcritical range $T\ll \alpha ^{-2}$ we have postulated, but instead only in a subregion thereof. This is what defines our admissible blue region in Figure 1.

We should mention that the idea of using the gain from randomness goes back to Bourgain [Reference Bourgain3] (in the random-data setting) and to Da Prato and Debussche [Reference Da Prato and Debussche14] (later, in the stochastic PDE setting). They first noticed that the ansatz $u=u^{(0)}+\mathcal R$ allows one to put the remainder $\mathcal R$ in a higher regularity space than the linear term $u^{(0)}$ . This idea has since been applied to many different situations (see, e.g., [Reference Bourgain and Bulut5Reference Burq and Tzvetkov8Reference Colliander and Oh11Reference Deng15Reference Dodson, Lührmann and Mendelson21Reference Kenig and Mendelson33Reference Nahmod and Staffilani38]), though most of these works either involve only the first-order expansion (i.e., $N=0$ ) or involve higher-order expansions with only suboptimal bounds (e.g., [Reference Bényi, Oh and Pocovnicu2]). To the best of our knowledge, the present paper is the first work where the sharp bounds for these $u^{(j)}$ terms are obtained to arbitrarily high order (at least in the dispersive setting).

Remark 1.5. There are two main reasons why the high-order expansion (1.5) gives the sharp time of control, in contrast to previous works. The first is that we are able to obtain sharp estimates for the terms $u^{(j)}$ with arbitrarily high order, which were not known previously due to the combinatorial complexity associated with trees (see Section 1.2.2).

The second reason is more intrinsic. In higher-order versions of the original Bourgain–Da Prato–Debussche approach, it usually stops improving in regularity beyond a certain point, due to the presence of the high-low interactions (heuristically, the gain of powers of low frequency does not transform to the gain in regularity). This is a major difficulty in random-data theory, and in recent years a few methods have been developed to address it, including regularity structure [Reference Hairer29], para-controlled calculus [Reference Gubinelli, Imkeller and Perkowski28] and random averaging operators [Reference Deng, Nahmod and Yue18]. Fortunately, in the current problem this issue is absent, since the well-prepared initial data (1.2) bound the high-frequency components (where $\lvert k\rvert \sim 1$ ) and low-frequency components (where $\left \lvert k\right \rvert \sim L^{-1}$ ) uniformly, so the high-low interaction is simply controlled in the same way as the high-high interaction, allowing one to gain regularity indefinitely as the order increases.

1.2.2 Sharp estimates of Feynman trees

We start with the estimate for $u^{(n)}$ . As is standard with the cubic nonlinear Schrödinger equation, we first perform the Wick ordering by defining

Note that $M_0$ is essentially the mass which is conserved. Now w satisfies the renormalised equation

(1.6)

and $\left \lvert \widehat {w_k}(t)\right \rvert ^2=\left \lvert \widehat {u_k}(t)\right \rvert ^2$ . This gets rid of the worst resonant term, which would otherwise lead to a suboptimal time scale.

Let $w^{(n)}$ be the nth-order iteration of the nonlinearity in equation (1.6), corresponding to the $u^{(n)}$ in equation (1.5). Since this nonlinearity is cubic, by induction it is easy to see that $w^{(n)}$ can be written (say, in Fourier space) as a linear combination of termsFootnote 7 $\mathcal J_{\mathcal{T}\,}$ , where $\mathcal{T}\,$ runs over all ternary trees with exactly n branching nodes (we will say it has scale $\mathfrak s(\mathcal{T}\,\,)=n$ ). After some further reductions, the estimate for $\mathcal J_{\mathcal{T}\,}$ can be reduced to the estimate for terms of the form

(1.7) $$ \begin{align} \Sigma_k:=\sum_{\left(k_1,\ldots,k_{2n+1}\right)\in S}\eta_{k_1}^{\pm}\cdots \eta_{k_{2n+1}}^{\pm}, \qquad \left(\eta_k^+,\eta_k^{-}\right):=\left(\eta_k(\omega), \overline{\eta_k}(\omega)\right), \end{align} $$

where $\eta _k(\omega )$ is as in equation (1.2), $(k_1,\ldots ,k_{2n+1})\in \left (\mathbb {Z}_L^d\right )^{2n+1}$ , S is a suitable finite subset of $\left (\mathbb {Z}_L^d\right )^{2n+1}$ and the $(2n+1)$ subscripts correspond to the $(2n+1)$ leaves of $\mathcal{T}\,$ (see Definition 2.2 and Figure 2).Footnote 8

Figure 2 On the left, a node $\mathfrak n$ with its three children $\mathfrak n_1, \mathfrak n_2, \mathfrak n_3$ , with signs $\iota _1=\iota _3=\iota =-\iota _2$ . On the right, a tree of scale $4$ $(\mathfrak s(\mathcal{T}\,\,)=4)$ with root $\mathfrak r$ , four branching nodes ( $\mathfrak r, \mathfrak n_1, \mathfrak n_2, \mathfrak n_3$ ) and $l=9$ leaves, along with their signatures.

To estimate $\Sigma _k$ defined in formaul (1.7) we invoke the standard large deviation estimate (see Lemma 3.1), which essentially asserts that $\left \lvert \Sigma _k\right \rvert \lesssim (\#S)^{1/2}$ with overwhelming probability, provided that there is no pairing in $(k_1,\ldots ,k_{2n+1})$ , where a pairing $\left (k_i,k_j\right )$ means $k_i=k_j$ and the signs of $\eta _{k_i}$ and $\eta _{k_j}$ in formula (1.7) are opposites. Moreover, in the case of a pairing $\left (k_i,k_j\right )$ we can essentially replace $\eta _{k_i}^{\pm } \eta _{k_j}^{\pm }=\left \lvert \eta _{k_i}\right \rvert ^2\approx 1$ , so in general we can bound, with overwhelming probability,

$$ \begin{align*} \left\lvert\Sigma_k\right\rvert^2\lesssim\sum_{\left(\text{unpaired }k_i\right)}\left(\sum_{\substack{\left(\text{paired }k_i\right):\\ \left(k_1,\ldots,k_{2n+1}\right)\in S}}1\right)^2\lesssim\sum_{\left(k_1,\ldots,k_{2n+1}\right)\in S}1\cdot\sup_{\left(\text{unpaired }k_i\right)}\sum_{\substack{\left(\text{paired }k_i\right):\\ \left(k_1,\ldots,k_{2n+1}\right)\in S}}1. \end{align*} $$

It thus suffices to bound the number of choices for $(k_1,\ldots ,k_{2n+1})$ given the pairings, as well as the number of choices for the paired $k_j$ s given the unpaired $k_j$ s.

In the no-pairing case, such counting bounds are easy to prove, since the set S is well adapted to the tree structure of $\mathcal{T}\,$ ; what makes the counting nontrivial is the pairings, especially those between leaves that are far away or from different levels (see Figure 3, where a pairing is depicted by an extra link between the two leaves). Nevertheless, we have developed a counting algorithm that specifically deals with the given pairing structure of $\mathcal{T}\,$ and ultimately leads to sharp counting bounds and consequently sharp bounds for $\Sigma _k$ (see Proposition 3.5).

Figure 3 A paired tree with two pairings $(p=2)$ . The set ${\mathcal S}$ of single leaves is $\{\mathfrak l_1,\mathfrak l_4,\mathfrak l_6,\mathfrak l_7,\mathfrak l_9 \}$ . The subset $\mathcal R\subset \mathcal {S}\cup \{\mathfrak {r}\}$ of red-coloured vertices is $\{\mathfrak r, \mathfrak l_1,\mathfrak l_4,\mathfrak l_6\}$ . Here $(l, p, r)=(9, 2, 4)$ . A strongly admissible assignment with respect to this pairing, colouring and a certain fixed choice of the red modes $\left (k_{\mathfrak r},k_{\mathfrak l_4},k_{\mathfrak l_6}\right )$ corresponds to having the modes $k_{\mathfrak l_2}=k_{\mathfrak l_3}$ , $k_{\mathfrak l_5}=k_{\mathfrak l_8}$ and $\lvert k_{\mathfrak l}\rvert \leq L^{\theta }$ for all the uncoloured leaves. The rest of the modes are determined according to Definition 2.2.

1.2.3 An $\ell ^2$ operator norm bound

In contrast to the tree terms $\mathcal J_{\mathcal{T}\,}$ , the remainder term $\mathcal R_{N+1}$ has no explicit random structure. Indeed, the only way it feels the ‘chaos’ of the initial data is through the equation it satisfies, which in integral form and spatial Fourier variables looks like

$$ \begin{align*}\mathcal R_{N+1}=\mathcal J_{\sim N}+\mathcal L(\mathcal R_{N+1}) +\mathcal Q(\mathcal R_{N+1})+\mathcal C(\mathcal R_{N+1}), \end{align*} $$

where $\mathcal J_{\sim N}$ is a sum of Feynman trees $\mathcal J_{\mathcal{T}\,}$ (already described) of scale $\mathfrak s (\mathcal{T}\,\,)\sim N$ , and $\mathcal L$ , $\mathcal Q$ and $\mathcal C$ are, respectively, linear, bilinear and trilinear operator in $\mathcal R_{N+1}$ . The main point here is that one would like to propagate the estimates on $\mathcal J_{\sim N}$ to $\mathcal R_{N+1}$ itself; this is how we make rigorous the so-called ‘propagation of chaos or quasi-Gaussianity’ claims that are often adopted in formal derivations. In another aspect, qualitative results on propagation of quasi-Gaussianity, in the form of absolute continuity of measures, have been obtained in some cases (with different settings) by exploiting almost-conservation laws (e.g., [Reference Tzvetkov48]).

Since we are bootstrapping a smallness estimate on $\mathcal R_{N+1}$ , any quadratic and cubic form of $\mathcal R_{N+1}$ will be easily bounded. It therefore suffices to propagate the bound for the term $\mathcal L(\mathcal R_{N+1})$ , which reduces to bounding the $\ell ^2\to \ell ^2$ operator norm for the linear operator $\mathcal L$ . By definition, the operator $\mathcal L$ will have the form $v\mapsto \mathcal {IW}\left (\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}, v\right )$ , where $\mathcal {I}$ is the Duhamel operator, $\mathcal {W}$ is the trilinear form coming from the cubic nonlinearity and $\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}$ are trees of scale $\leq N$ ; thus in Fourier space it can be viewed as a matrix with random coefficients. The key to obtaining the sharp estimate for $\mathcal L$ is then to exploit the cancellation coming from this randomness, and the most efficient way to do this is via the $TT^*$ method.

In fact, the idea of applying the $TT^*$ method to random matrices has already been used by Bourgain [Reference Bourgain3]. In that paper one is still far above (probabilistic) criticality, so applying the $TT^*$ method once already gives adequate control. In the present case, however, we are aiming at obtaining sharp estimates, so applying $TT^*$ once will not be sufficient.

The solution is thus to apply $TT^*$ sufficiently many times (say, $D\gg 1$ ), which leads to the analysis of the kernel of the operator $(\mathcal L\mathcal L^*)^D$ . At first sight this kernel seems to be a complicated multilinear expression which is difficult to handle; nevertheless, we make one key observation, namely that this kernel can essentially be recast in the form of formula (1.7) for some large auxiliary tree $\mathcal{T}\,=\mathcal{T}\;^D$ , which is obtained from a single root node by attaching copies of the trees $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ successively a total of $2D$ times (see Figure 4). With this observation, the arguments in the previous section then lead to sharp bounds of the kernel of $(\mathcal L\mathcal L^*)^D$ , up to some loss that is a power of L independent of D; taking the $1/(2D)$ power and choosing D sufficiently large makes this power negligible and implies the sharp bound for the operator norm of $\mathcal L$ (see Section 3.3).

Figure 4 Construction of the tree $\mathcal{T}\,^D$ by successive plantings of trees $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ onto the first two nodes of a ternary tree, starting with a root $\mathfrak r$ and stopping after $2D$ steps, leaving a leaf node $\mathfrak r'$ . In the figure, $D=2$ .

1.2.4 Sharpness of estimates

We remark that the estimates we prove for $\mathcal J_{\mathcal{T}\,}$ are sharp up to some finite power of L (independent of  $\mathcal{T}\,$ ). More precisely, from Proposition 2.5 we know that for any ternary tree $\mathcal{T}\,$ of scale n and possible pairing structure (see Definition 3.3), with overwhelming probability,

(1.8) $$ \begin{align}\sup_k\left\lVert(\mathcal J_{\mathcal{T}})_k\right\rVert_{h^{b}}\leq L^{0+}\rho^n, \end{align} $$

where $\rho $ is some quantity depending on $\alpha $ , L and T (see formula (2.24)), k is the spatial Fourier variable and $h^b$ is a time-Sobolev norm defined in equation (2.22); on the other hand, we will show that that for some particular choice of trees $\mathcal{T}\,$ of scale n and some particular choice of pairings, with high probability,

(1.9) $$ \begin{align}\sup_k\left\lVert(\mathcal J_{\mathcal{T}})_k\right\rVert_{h^{b}}\geq L^{-d}\rho^n. \end{align} $$

The timescale T of Theorem 1.3 is the largest that makes $\rho \ll 1$ ; thus if one wants to go beyond T in cases other than Theorem 1.1, it would be necessary to address the divergence of formula (1.9) with $\rho \gg 1$ by exploiting the cancellation between different tree terms or different pairing choices (see Section 3.4).

1.2.5 Discussions

Shortly after the completion of this paper, work of Collot and Germain [Reference Collot and Germain12] was announced that studies the same problem, but only in the rational-torus setting. In the language of this paper, their result corresponds to the validity of equation (1.3) for $L\leq t\leq L^{2-\delta }$ , under the assumption $\alpha \leq L^{-1-\delta }$ . This is a special case of Theorem 1.3, essentially corresponding to the rectangle below the horizontal line $\log _LT=2$ and to the right of the vertical line $\log _L\left (\alpha ^{-1}\right )=1$ in Figure 1. We also mention later work by the same authors [Reference Collot and Germain13], where they consider a generic nonrectangular torus (as opposed to the rectangular tori here and in [Reference Collot and Germain12]) and prove the existence of solutions (but without justifying equation (1.3)) up to time $t\leq L^{-\delta }T_{\mathrm {kin}}$ for a wider range of power laws between $\alpha $ and L.

While the present paper was being peer-reviewed, we submitted new work to arXiv [Reference Deng and Hani16], in which we provide the first full derivation of (WKE) from (NLS). Those results reach the kinetic time scale $t=\tau \cdot T_{\mathrm {kin}}$ , where $\tau $ is independent of L (compared to Theorem 1.1 here, where $\tau \leq L^{-\varepsilon }$ ), for the scaling law $\alpha \sim L^{-1}$ on generic (irrational) rectangular tori and the scaling laws $\alpha \sim L^{-\gamma }$ (where $\gamma <1$ and is close to $1$ ) on arbitrary rectangular tori.

Shortly after completing [Reference Deng and Hani16], we received a preprint of a forthcoming deep work by Staffilani and Tran [Reference Staffilani and Tran47]. It concerns a high-dimensional (on $\mathbb {T}^d$ for $d\geq 14$ ) KdV equation under a time-dependent Stratonovich stochastic forcing, which effectively randomises the phases without injecting energy into the system. The authors derive the corresponding wave kinetic equation up to the kinetic time scale, for the scaling law $\alpha \sim L^{-0}$ (i.e., first taking $L\to \infty $ and then taking $\alpha \to 0$ ). They also prove a conditional result without such forcing, where the condition is verified for some particular initial densities converging to the equilibrium state (stationary solution to the wave kinetic equation) in the limit.

1.3 Organisation of the paper

In Section 2 we explain the diagrammatic expansion of the solution into Feynman trees, and state the a priori estimates on such trees and remainder terms, which yield the long-time existence of such expansions. Section 3 is devoted to the proof of those a priori estimates. In Section 4 we prove the main theorems already mentioned, and in Section 5 we prove the necessary number-theoretic results that allow us to replace the highly oscillating Riemann sums by integrals.

1.4 Notation

Most notation will be standard. Let $z^+=z$ and $z^-=\overline {z}$ . Define $\left \lvert k\right \rvert _{\beta }$ by $\left \lvert k\right \rvert _{\beta }^2=\beta _1k_1^2+\cdots +\beta _dk_d^2$ for $k=(k_1,\ldots ,k_d)$ . The spatial Fourier series of a function $u: {\mathbb T}_L^d \to \mathbb C$ is defined on $\mathbb Z^d_L:=L^{-1}\mathbb Z^{d}$ by

(1.10) $$ \begin{align} \widehat{u}_k=\int_{{\mathbb T}^d_L} u(x) e^{-2\pi i k\cdot x},\quad \text{so that}\quad u(x)=\frac{1}{L^d}\sum_{k \in \mathbb Z^d_L} \widehat{u}_k e^{2\pi i k\cdot x}. \end{align} $$

The temporal Fourier transform is defined by

$$ \begin{align*} \widetilde{f}(\tau)=\int_{\mathbb{R}}e^{-2\pi it\tau}f(t)\mathrm{d}t. \end{align*} $$

Let $\delta>0$ be fixed throughout the paper. Let N, s and $b>\frac {1}{2}$ be fixed, such that N and s are large enough and $b-\frac {1}{2}$ is small enough, depending on d and $\delta $ . The quantity C will denote any large absolute constant, not dependent on $\big(N,s,b-\frac {1}{2}\big)$ , and $\theta $ will denote any small positive constant, which is dependent on $\big(N,s,b-\frac {1}{2}\big)$ ; these may change from line to line. The symbols $O(\cdot )$ , $\lesssim $ and so on will have their usual meanings, with implicit constants depending on $\theta $ . Let L be large enough depending on all these implicit constants. If some statement S involving $\omega $ is true with probability $\geq 1-Ke^{-L^{\theta }}$ for some constant K (depending on $\theta $ ), then we say this statement S is L-certain.

When a function depends on many variables, we may use notations like

$$ \begin{align*} f=f\left(x_i:i\in A,\,y_j:1\leq j\leq m\right) \end{align*} $$

to denote a function f of variables $(x_i:i\in A)$ and $y_1,\ldots ,y_m$ .

2 Tree expansions and long-time existence

2.1 First reductions

Let $\widehat {u}_k(t)$ be the Fourier coefficients of $u(t)$ , as in equation (1.10). Then with $c_k(t):= e^{2\pi i\left \lvert k\right \rvert _{\beta }^2t} \widehat u_k(t)=\left (\mathcal F_{{\mathbb T}^d_L} e^{-it\Delta _{\beta }} u\right )(k)$ , we arrive at the following equation for the Fourier modes:

(2.1) $$ \begin{align} \begin{cases} i \dot{c_k} = \left(\frac{\lambda}{L^{d}}\right)^{2} \sum\limits_{\substack{\left(k_1,k_2,k_{3}\right) \in \left(\mathbb{Z}^d_L\right)^3 \\ k - k_1 + k_2 -k_3 = 0}} c_{k_1}\overline{c_{k_2}} c_{k_3} e^{2\pi i \Omega\left(k_1,k_2,k_3,k\right)t} \\[1.5em] c_k(0) = (c_k)_{\mathrm{in}}=\widehat u_k(0), \end{cases} \end{align} $$

where $ \Omega (k_1,k_2,k_3,k) =\left \lvert k_1\right \rvert _{\beta }^2-\left \lvert k_2\right \rvert _{\beta }^2+\left \lvert k_3\right \rvert _{\beta }^2-\left \lvert k\right \rvert _{\beta }^2. $ Note that the sum can be written as

$$ \begin{align*}\sum\limits_{\substack{\left(k_1,k_2, k_{3}\right) \in \left(\mathbb{Z}^d_L\right)^3 \\ k - k_1 + k_2 -k_3 = 0}}=2\sum_{k_1=k}-\sum_{k_1=k_2=k_3} +\sum_{k_1, k_3\neq k}, \end{align*} $$

which, defining $M=\sum _{k_3} \left \lvert c_{k_3}\right \rvert ^2$ (which is conserved), allows us to write

$$ \begin{align*} i \dot{c_k} = \left(\frac{\lambda}{L^{d}}\right)^{2} \left(2M c_k -\left\lvert c_k\right\rvert^2c_k+ \sum\limits_{\left(k_1,k_2, k_{3}\right)}^{\times} c_{k_1}\overline{c_{k_2}} c_{k_3} e^{2\pi i \Omega\left(k_1,k_2,k_3,k\right)t}\right). \end{align*} $$

Here and later, $\sum ^{\times }$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1-k_2+k_3=k$ and $k\not \in \{k_1,k_3\}$ . Introducing $b_k(t)=c_k(t)e^{-2i\left (L^{-d}\lambda \right )^{2}Mt}$ , we arrive at the following equation for $b_k(t)$ :

(2.2) $$ \begin{align} \begin{cases} i \dot{b_k} = \left(\frac{\lambda}{L^{d}}\right)^{2} \left( -\left\lvert b_k\right\rvert^2b_k+ \sum\limits_{\left(k_1,k_2, k_{3}\right)}^{\times} b_{k_1}\overline{b_{k_2}} b_{k_3} e^{2\pi i \Omega\left(k_1,k_2,k_3,k\right)t}\right)\\[1.5em] b_k(0) = (b_k)_{\mathrm{in}}=\widehat u_k(0). \end{cases} \end{align} $$

In Theorem 1.3 we will be studying the solution $u(t)$ , or equivalently the sequence $(b_k(t))_{k \in \mathbb Z^d_L}$ , on a time interval $[0,T]$ . It will be convenient, to simplify some notation later, to work on the unit time interval $[0,1]$ . For this we introduce the final ansatz

$$ \begin{align*} a_k(t)=b_k(Tt), \end{align*} $$

which satisfies the equation

(2.3) $$ \begin{align} \begin{cases} i \dot{a_k} = \left(\frac{\alpha T}{L^{d}}\right) \left( -\left\lvert a_k\right\rvert^2a_k+ \sum\limits_{\left(k_1,k_2, k_{3}\right)}^{\times} a_{k_1}\overline{a_{k_2}} a_{k_3} e^{2\pi i T\Omega\left(k_1,k_2,k_3,k\right)t}\right)\\[1.5em] a_k(0) =(a_k)_{\mathrm{in}}=\widehat u_k(0). \end{cases} \end{align} $$

Here we have also used the relation $\alpha =\lambda ^2L^{-d}$ . Recall the well-prepared initial data (1.2), which transform into the initial data for $a_k$ :

(2.4) $$ \begin{align} (a_k)_{\mathrm{in}}=\sqrt{n_{\mathrm{in}}} \cdot\eta_{k}(\omega), \end{align} $$

where $\eta _{k}(\omega )$ are the same as in equation (1.2).

2.2 The tree expansion

Let $\boldsymbol a(t) =(a_k(t))_{k \in \mathbb Z^d_L}$ and $\boldsymbol {a}_{\mathrm {in}} =\boldsymbol a(0)$ . Let $J=[0,1]$ ; we will fix a smooth compactly supported cutoff function $\chi $ such that $\chi \equiv 1$ on J. Then by equation (2.3), we know that for $t\in J$ we have

(2.5) $$ \begin{align}\boldsymbol{a}(t)=\chi(t)\boldsymbol{a}_{\mathrm{in}}+\mathcal{IW}(\boldsymbol a, \boldsymbol a, \boldsymbol a)(t),\end{align} $$

where the Duhamel term is defined by

(2.6) $$ \begin{align} \mathcal{I}F(t)=\chi(t)\int_0^t\chi(t')F(t')\mathrm{d}t', \end{align} $$
(2.7) $$ \begin{align} \mathcal{W}(\boldsymbol b, \boldsymbol c , \boldsymbol d)_k(t):=-\frac{i\alpha T}{L^{d}}\left( -\left(b_{k}\overline{c_{k}}d_{k}\right)(t)+ \sum\limits_{\left(k_1,k_2, k_{3}\right)}^{\times} \left(b_{k_1}\overline{c_{k_2}} d_{k_3}\right)(t) e^{2\pi i T\Omega\left(k_1,k_2,k_3,k\right)t}\right). \end{align} $$

Since we will only be studying $\boldsymbol {a}$ for $t\in J$ , from now on we will replace $\boldsymbol {a}$ by the solution to equation (2.5) for $t\in \mathbb {R}$ (the existence and uniqueness of the latter will be clear from a proof to follow). We will be analysing the temporal Fourier transform of this (extended) $\boldsymbol {a}$ , so let us first record a formula for $\mathcal {I}$ on the Fourier side:

Lemma 2.1. Let $\mathcal {I}$ be defined as in equation (2.6), and recall that $\widetilde {G}$ means the temporal Fourier transform of G; then we have

(2.8) $$ \begin{align}\widetilde{\mathcal{I}F}(\tau)=\int_{\mathbb{R}}(I_0+I_1)(\tau,\sigma)\widetilde{F}(\sigma)\mathrm{d}\sigma,\qquad\left\lvert\partial_{\tau,\sigma}^aI_d(\tau,\sigma)\right\rvert\lesssim_{a,A}\frac{1}{\left\langle \tau-d\sigma\right\rangle^A}\frac{1}{\langle\sigma\rangle}.\end{align} $$

Now define $\mathcal J_n$ recursively by

(2.9) $$ \begin{align} \begin{array}{ll} \mathcal J_0(t)&=\chi(t)\cdot\boldsymbol{a}_{\mathrm{in}},\\ \mathcal J_n(t)&=\displaystyle\sum_{n_1+n_2+n_3=n-1}\mathcal{IW}\left(\mathcal J_{n_1},\mathcal J_{n_2},\mathcal J_{n_3}\right)(t), \end{array} \end{align} $$

and define

(2.10) $$ \begin{align}\mathcal J_{\leq N}=\sum_{n\leq N}\mathcal J_{n},\qquad \mathcal{R}_{N+1}=\boldsymbol{a}-\mathcal J_{\leq N}.\end{align} $$

By plugging in equation (2.5), we get that $\mathcal R_{N+1}$ satisfies the equation

(2.11) $$ \begin{align}\mathcal R_{N+1}=\mathcal J_{\sim N}+\mathcal L(\mathcal R_{N+1}) +\mathcal Q(\mathcal R_{N+1})+\mathcal C(\mathcal R_{N+1}), \end{align} $$

where the relevant terms are defined as

(2.12) $$ \begin{align} \mathcal J_{\sim N}&:=\sum_{\substack{n_1, n_2, n_3\leq N\\ n_1+n_2+n_3\geq N}} \mathcal{IW}\left(\mathcal J_{n_1},\mathcal J_{n_2},\mathcal J_{n_3}\right), \end{align} $$
(2.13) $$ \begin{align} \mathcal L(v)&:=\sum_{n_1,n_2\leq N}\left(2\mathcal{IW}\left(\mathcal J_{n_1}, \mathcal J_{n_2}, v\right)+\mathcal{IW}\left(\mathcal J_{n_1}, v, \mathcal J_{n_2}\right)\right), \end{align} $$
(2.14) $$ \begin{align} \mathcal Q(v)&:=\sum_{n_1\leq N}\left(2\mathcal{IW}\left(v,v, \mathcal J_{n_1}\right)+\mathcal{IW}\left(v, \mathcal J_{n_1}, v\right)\right), \end{align} $$
(2.15) $$ \begin{align} \mathcal C(v)&:=\mathcal{IW}(v, v, v). \end{align} $$

Next we will derive a formula for the time Fourier transform of $\mathcal J_n$ ; for this we need some preparation regarding multilinear forms associated with ternary trees.

Definition 2.2.

  1. 1. Let $\mathcal{T}~$ be a ternary tree. We use $\mathcal {L}$ to denote the set of leaves and l their number, $\mathcal {N}=\mathcal{T}\,\backslash \mathcal L$ the set of branching nodes and n their number, and $\mathfrak {r} \in \mathcal N$ the root node. The scale of a ternary tree $\mathcal{T}\,$ is defined as $\mathfrak s(\mathcal{T}\,\,)=n$ (the number of branching nodes).Footnote 9 A tree of scale n has $l=2n+1$ leaves and a total of $3n+1$ vertices.

  2. 2. (Signs on a tree) For each node $\mathfrak {n}\in \mathcal {N}$ , let its children from left to right be $\mathfrak {n}_1$ , $\mathfrak {n}_2$ , $\mathfrak {n}_3$ . We fix the sign $\iota _{\mathfrak {n}}\in \{\pm \}$ as follows: first $\iota _{\mathfrak {r}}=+$ , then for any node $\mathfrak {n}\in \mathcal {N}$ , define $\iota _{\mathfrak {n}_1}=\iota _{\mathfrak {n}_3}=\iota _{\mathfrak {n}}$ and $\iota _{\mathfrak {n}_2}=-\iota _{\mathfrak {n}}$ .

  3. 3. (Admissible assignments) Suppose we assign to each $\mathfrak {n}\in \mathcal{T}\,$ an element $k_{\mathfrak {n}}\in \mathbb {Z}_L^d$ . We say such an assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is admissible if for any $\mathfrak {n}\in \mathcal {N}$ we have $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}-k_{\mathfrak {n}_2}+k_{\mathfrak {n}_3}$ and either $k_{\mathfrak {n}}\not \in \left \{k_{\mathfrak {n}_1},k_{\mathfrak {n}_3}\right \}$ or $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}=k_{\mathfrak {n}_2}=k_{\mathfrak {n}_3}$ . Clearly an admissible assignment is completely determined by the values of $k_{\mathfrak {l}}$ for $\mathfrak {l}\in \mathcal {L}$ . For any assignment, we denote $\Omega _{\mathfrak {n}}:=\Omega \left (k_{\mathfrak {n}_1},k_{\mathfrak {n}_2},k_{\mathfrak {n}_3},k_{\mathfrak {n}}\right )$ . Suppose we also fixFootnote 10 $d_{\mathfrak {n}}\in \{0,1\}$ for each $\mathfrak {n}\in \mathcal {N}$ ; then we can define $q_{\mathfrak {n}}$ for each $\mathfrak {n}\in \mathcal{T}\,$ inductively by

    (2.16) $$ \begin{align} q_{\mathfrak{n}}=0\text{ if }\mathfrak{n}\in\mathcal L\quad\text{or}\quad q_{\mathfrak{n}}=d_{\mathfrak{n}_1}q_{\mathfrak{n}_1}-d_{\mathfrak{n}_2}q_{\mathfrak{n}_2}+d_{\mathfrak{n}_3}q_{\mathfrak{n}_3}+\Omega_{\mathfrak{n}}\text{ if }\mathfrak{n}\in\mathcal{N}.\end{align} $$

Proposition 2.3. For each ternary tree $\mathcal{T}\,$ , define $\mathcal J_{\mathcal{T}\,}$ inductively by

(2.17) $$ \begin{align} \mathcal J_{\bullet}(t)=\chi(t)\cdot\boldsymbol{a}_{\mathrm{in}},\qquad \mathcal J_{\mathcal{T}}(t)=\mathcal{IW}\left(\mathcal J_{\mathcal{T}_1},\mathcal J_{\mathcal{T}_2},\mathcal J_{\mathcal{T}_3}\right)(t), \end{align} $$

where $\bullet $ represents the tree with a single node and $\mathcal{T}\,_1$ , $\mathcal{T}\,_2$ , $\mathcal{T}\,_3$ are the subtrees rooted at the three children of the root node of $\mathcal{T}\,$ . Then we have

(2.18) $$ \begin{align}\mathcal J_n=\sum_{\mathfrak s(\mathcal{T}\,\,)=n}\mathcal J_{\mathcal{T}}. \end{align} $$

Moreover, for any $\mathcal{T}\,$ of scale $\mathfrak s(\mathcal{T}\,\,)=n$ we have the formula

(2.19) $$ \begin{align} \left(\widetilde{\mathcal J_{\mathcal{T}}}\right)_{k}(\tau)=\left(\frac{\alpha T}{L^{d}}\right)^n\sum_{\left(k_{\mathfrak{n}}:\mathfrak{n}\in\mathcal{T}\right)}\mathcal{K}_{\mathcal{T}}(\tau,k_{\mathfrak{n}}:\mathfrak{n}\in\mathcal{T}\,\,)\prod_{\mathfrak{l}\in\mathcal L}\sqrt{n_{\mathrm{in}}(k_{\mathfrak{l}})}\cdot\prod_{\mathfrak{l}\in\mathcal L}\left[\eta_{k_{\mathfrak{l}}}(\omega)\right]^{\iota_{\mathfrak{l}}}, \end{align} $$

where the sum is taken over all admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ such that $k_{\mathfrak {r}}=k$ , and the function $\mathcal {K}=\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ satisfies

(2.20) $$ \begin{align}\left\lvert\partial_{\tau}^a\mathcal{K}\right\rvert\lesssim_{a,A}\sum_{\left(d_{\mathfrak{n}}:\mathfrak{n}\in\mathcal{N}\right)}\left\langle \tau-Td_{\mathrm{r}}q_{\mathrm{r}}\right\rangle^{-A}\cdot\prod_{\mathfrak{n}\in\mathcal{N}}\left\langle Tq_{\mathfrak{n}}\right\rangle^{-1}, \end{align} $$

where $q_{\mathfrak {n}}$ is defined in equation (2.16).

Proof. First, equation (2.18) follows from the definitions in equations (2.9) and (2.17) and an easy induction. We now prove formulas (2.19) and (2.20) inductively, noting also that $(a_k)_{\mathrm {in}}=\sqrt {n_{\mathrm {in}}(k)}\cdot \eta _k(\omega )$ . For $\mathcal{T}\,=\bullet $ , equation (2.19) follows from equation (2.17) with $\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {r}})=\widetilde {\chi }(\tau )$ that satisfies formula (2.20). Now suppose formulas (2.19) and (2.20) are true for smaller trees; then by formulas (2.7) and (2.17) and Lemma 2.1, up to unimportant coefficients, we can write

$$ \begin{align*}\left(\widetilde{\mathcal J_{\mathcal{T}}}\right)_{k}(\tau)=\frac{i\alpha T}{L^{d}}\sum_{d\in\{0,1\}}\sum_{\left(k_1,k_2,k_3\right)}^*\int_{\mathbb{R}^3}I_d(\tau,\sigma)\prod_{j=1}^3\left[\left(\widetilde{\mathcal J_{\mathcal{T}_j}}\right)_{k_j}\left(\tau_j\right)\right]^{\iota_j}\mathrm{d}\tau_j,\end{align*} $$

where $\sum ^*$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1-k_2+k_3=k$ and either $k\not \in \{k_1,k_3\}$ or $k=k_1=k_2=k_3$ , the signs $(\iota _1,\iota _2,\iota _3)=(+,-,+)$ , and $\sigma =\tau _1-\tau _2+\tau _3+T\Omega (k_1,k_2,k_3,k)$ . Now applying the induction hypothesis, we can write $\left (\widetilde {\mathcal J_{\mathcal{T}\,}}\right )_{k}(\tau )$ in the form of equation (2.19) with the function

(2.21) $$ \begin{align}\mathcal{K}_{\mathcal{T}}(\tau,k_{\mathfrak{n}}:\mathfrak{n}\in\mathcal{T}\,\,)=\sum_{d\in\{0,1\}}\int_{\mathbb{R}^3}I_d(\tau,\tau_1-\tau_2+\tau_3+T\Omega_{\mathfrak{r}})\prod_{j=1}^3\left[\mathcal{K}_{\mathcal{T}_j}\left(\tau_j,k_{\mathfrak{n}}:\mathfrak{n}\in\mathcal{T}_j\right)\right]^{\iota_j}\mathrm{d}\tau_j, \end{align} $$

where $\mathfrak {r}$ is the root of $\mathcal{T}\,$ with children $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ and $\mathcal{T}\,_j$ is the subtree rooted at $\mathfrak {r}_j$ .

It then suffices to prove that $\mathcal {K}_{\mathcal{T}\,}$ defined by equation (2.21) satisfies formula (2.20). By the induction hypothesis, we may fix a choice $d_{\mathfrak {n}}$ for each nonleaf node $\mathfrak {n}$ of each $\mathcal{T}\,_j$ , and let $d_{\mathfrak {r}}=d$ . Then plugging formula (2.20) into equation (2.21), we get

$$ \begin{align*}\left\lvert\partial_{\tau}^a\mathcal{K}_{\mathcal{T}}\right\rvert \lesssim_{a,A}\prod_{\mathfrak{r}\neq\mathfrak{n}\in\mathcal{N}}\frac{1}{\langle T q_{\mathfrak{n}}\rangle}\int_{\mathbb{R}^3}\frac{1}{\left\langle\tau\!-\!d(\tau_1\!-\!\tau_2+\tau_3+T\Omega_{\mathfrak{r}})\right\rangle^A}\frac{1}{\langle \tau_1\!-\!\tau_2+\tau_3+T\Omega_{\mathfrak{r}}\rangle}\prod_{j=1}^3\frac{\mathrm{d}\tau_j}{\left\langle\tau_j\!-\!Td_{\mathfrak{r}_j}q_{\mathfrak{r}_j}\right\rangle^A},\end{align*} $$

which upon integration in $\tau _j$ gives equation (2.20). This completes the proof.

2.3 Statement of main estimates

Define the $h^b$ space by

(2.22) $$ \begin{align} \lVert a(t)\rVert_{h^{b}} = \left(\int_{{\mathbb R}} \left\langle \tau \right\rangle^{2b}\left\lvert\widetilde{a}(\tau)\right\rvert^2 d\tau\right)^{\frac12}, \end{align} $$

and similarly the $h^{s,b}$ space for $\boldsymbol a(t)=(a_k(t))_{k \in \mathbb Z^d_L}$ by

(2.23) $$ \begin{align} \lVert\boldsymbol a\rVert_{h^{s,b}} = \left(L^{-d}\sum_{k \in \mathbb Z^d_L}\int_{{\mathbb R}} \left\langle \tau \right\rangle^{2b}\left\langle k\right\rangle^{2s}\left\lvert\widetilde{a}_k(\tau)\right\rvert^2 d\tau\right)^{\frac12}. \end{align} $$

We shall estimate the solution u in an appropriately rescaled $X^{s, b}$ space, which is equivalent to estimating the sequence $\boldsymbol a(t)=\left (a_k(t)\right )_{k \in \mathbb Z^d_L}$ in the space $h^{s, b}$ . Define the quantity

(2.24) $$ \begin{align}\rho:= \begin{cases} \alpha T&\text{if }1\leq T\leq L,\\ \alpha L&\text{if }L\leq T\leq L^2,\\ \alpha TL^{-1}&\text{if } T\geq L^2\text{ and }\beta_i\text{ is generic}. \end{cases}\end{align} $$

By the definition of $\delta>0$ in formula (1.4), we can verify that $\alpha T^{1/2}\leq \rho \leq L^{-\delta }$ .

Proposition 2.4 Well-posedness bounds

Let $\rho $ be defined as in formula (2.24); then L-certainly, for all $1\leq n\leq 3N$ , we have

(2.25) $$ \begin{align}\sup_k\left\langle k\right\rangle^{2s}\left\lVert(\mathcal{J}_n)_k\right\rVert_{h^{b}}\leq L^{\theta+C\left(b-\frac{1}{2}\right)}\rho^{n-1}\left(\alpha T^{\frac{1}{2}}\right)\leq L^{\theta+C\left(b-\frac{1}{2}\right)}\rho^{n}, \end{align} $$
(2.26) $$ \begin{align}\left\lVert\mathcal{R}_{N+1}\right\rVert_{h^{s,b}}\leq \rho^{N}. \end{align} $$

Proposition 2.4 follows from the following two bounds, which will be proved in Section 3:

Proposition 2.5 Bounds of tree terms

We have, L-certainly, that

(2.27) $$ \begin{align}\sup_k\left\langle k\right\rangle^{2s}\left\lVert(\mathcal{J}_{\mathcal{T}})_k\right\rVert_{h^{b}}\leq L^{\theta+C\left(b-\frac{1}{2}\right)}\rho^{n-1}\left(\alpha T^{\frac{1}{2}}\right)\leq L^{\theta+C\left(b-\frac{1}{2}\right)}\rho^{n} \end{align} $$

for any ternary tree of depth n, where $0\leq n\leq 3N$ .

Proposition 2.6 An operator norm bound

We have, L-certainly, that for any trees $\mathcal{T}\,_1,\mathcal{T}\,_2$ with $\left \lvert \mathcal{T}\,_j\right \rvert =3n_j+1$ and $0\leq n_1,n_2\leq N$ , the operators

(2.28) $$ \begin{align}\mathcal{P}_+:v\mapsto\mathcal{IW}\left(\mathcal J_{\mathcal{T}_1},\mathcal J_{\mathcal{T}_2},v\right)\quad\text{and}\quad \mathcal{P}_-:v\mapsto\mathcal{IW}\left(\mathcal J_{\mathcal{T}_1},v,\mathcal J_{\mathcal{T}_2}\right)\end{align} $$

satisfy the bounds

(2.29) $$ \begin{align}\left\lVert\mathcal{P}_{\pm}\right\rVert_{h^{s,b}\to h^{s,b}}\leq L^{\theta}\rho^{n_1+n_2+\frac{1}{2}}. \end{align} $$

Remark 2.7. The bound (2.29) is a result of the probabilistic subcriticality of the problem. Similar bounds are also used in recent work by the first author, Nahmod and Yue [Reference Deng, Nahmod and Yue19] to get sharp probabilistic local well-posedness of nonlinear Schrödinger equations. The proof in both cases relies on high-order $TT^*$ arguments, although in [Reference Deng, Nahmod and Yue19] one needs to use the more sophisticated tensor norms due to the different ansatz caused by the inhomogeneity of initial data.

Proof of Proposition 2.4 (assuming Propositions 2.5 and 2.6)

Assume we have already excluded an exceptional set of probability $\lesssim e^{-L^{\theta }}$ . The bound (2.25) follows directly from formulas (2.18) and (2.27); it remains to bound $\mathcal {R}_{N+1}$ . Recall that $\mathcal {R}_{N+1}$ satisfies equation (2.11), so it suffices to prove that the mapping

$$ \begin{align*}v\mapsto \mathcal J_{\sim N}+\mathcal L(v)+\mathcal Q(v)+\mathcal C(v)\end{align*} $$

is a contraction mapping from the set $\mathcal {Z}=\left \{v:\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^{N}\right \}$ to itself. We will prove only that it maps $\mathcal {Z}$ into $\mathcal {Z}$ , as the contraction part follows in a similar way. Now suppose $\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^N$ ; then by formulas (2.18) and (2.27), we have

$$ \begin{align*} \left\lVert\mathcal J_{\sim N}\right\rVert_{h^{s,b}}^2\sim L^{-d}\sum_{k\in\mathbb{Z}_L^d}\left\langle k\right\rangle^{2s}\left\lVert(\mathcal J_{\sim N})_k\right\rVert_{h^b}^2\lesssim \left(L^{\theta+C\left(b-\frac{1}{2}\right)}\rho^{N+1}\right)^2\cdot L^{-d}\sum_{k\in\mathbb{Z}_L^d}\left\langle k\right\rangle^{-2s}\ll\rho^{2N},\end{align*} $$

so $\left \lVert \mathcal J_{\sim N}\right \rVert _{h^{s,b}}\ll \rho ^N$ . Next we may use formula (2.29) to bound

$$ \begin{align*}\left\lVert\mathcal L(v)\right\rVert_{h^{s,b}}\leq L^{\theta}\rho^{\frac{1}{2}}\cdot\left\lVert v\right\rVert_{h^{s,b}}\leq L^{\theta}\rho^{\frac{1}{2}}\cdot \rho^N\ll\rho^N.\end{align*} $$

As for the terms $\mathcal Q(v)$ and $\mathcal C(v)$ , we apply the simple bound

(2.30) $$ \begin{align} &\left\lVert\mathcal{IW}(u,v,w)\right\rVert_{h^{s,b}}\lesssim\left\lVert\mathcal{IW} (u,v,w)\right\rVert_{h^{s,1}}\lesssim\left\lVert\mathcal{IW}(u,v,w)\right\rVert_{h^{s,0}}+ \left\lVert\partial_t\mathcal{IW}(u,v,w)\right\rVert_{h^{s,0}}\nonumber\\ &\quad \lesssim\frac{\alpha T}{L^{d}}\sum_{\mathrm{cyc}}\left\lVert u\right\rVert_{h^{s,0}}\left\lVert v_k(t)\right\rVert_{\ell_k^1L_t^{\infty}}\left\lVert w_k(t)\right\rVert_{\ell_k^1L_t^{\infty}}\lesssim \alpha TL^d\left\lVert u\right\rVert_{h^{s,b}}\left\lVert v\right\rVert_{h^{s,b}}\left\lVert w\right\rVert_{h^{s,b}} \end{align} $$

(which easily follows from formula (2.7)), where $\sum _{\mathrm {cyc}}$ means summing in permutations of $(u,v,w)$ . As $\alpha T\leq L^{d}$ , we conclude (also using Proposition 2.5) that

$$ \begin{align*}\left\lVert\mathcal Q(v)\right\rVert_{h^{s,b}}+\left\lVert\mathcal C(v)\right\rVert_{h^{s,b}}\lesssim \alpha TL^{\theta+d+C\left(b-\frac{1}{2}\right)}\rho^{2N}\ll\rho^N,\end{align*} $$

since $\rho \leq L^{-\delta }$ and $N\gg \delta ^{-1}$ . This completes the proof.

3 Proof of main estimates

In this section we prove Propositions 2.5 and 2.6.

3.1 Large deviation and basic counting estimates

We start by making some preparations, namely the large deviation and counting estimates that will be used repeatedly in the proof later.

Lemma 3.1. Let $\{\eta _k(\omega )\}$ be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle. Let $F=F(\omega )$ be defined by

(3.1) $$ \begin{align}F(\omega)=\sum_{k_1,\ldots,k_n}a_{k_1\cdots k_n}\prod_{j=1}^n\eta_{k_j}^{\iota_j}, \end{align} $$

where $a_{k_1\cdots k_n}$ are constants; then F can be divided into finitely many terms, and for each term there is a choice of $X=\left \{i_1,\ldots ,i_p\right \}$ and $Y=\left \{j_1,\ldots ,j_p\right \}$ , which are two disjoint subsets of $\{1,2,\ldots ,n\}$ , such that

(3.2) $$ \begin{align}\mathbb{P}\left(\lvert F(\omega)\rvert\geq A\cdot M^{\frac{1}{2}}\right)\leq Ce^{-cA^{\frac{2}{n}}} \end{align} $$

holds with

(3.3) $$ \begin{align}M=\sum_{\left(k_{\ell}\right):\ell\not\in X\cup Y}\left(\sum_{\text{pairing }\left(k_{i_s},k_{j_s}\right):1\leq s\leq p}\left\lvert a_{k_1\cdots k_n}\right\rvert\right)^2, \end{align} $$

where a pairing $\left (k_{i},k_{j}\right )$ means $\left (\iota _i+\iota _j,\iota _ik_i+\iota _jk_j\right )=0$ .

Proof. First assume $\eta _k$ is Gaussian. Then by the standard hypercontractivity estimate for an Ornstein–Uhlenbeck semigroup (see, e.g., [Reference Oh and Thomann40]), we know that formula (3.2) holds with M replaced by $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ . Now to estimate $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ , by dividing the sum (3.1) into finitely many terms and rearranging the subscripts, we may assume in a monomial of equation (3.1) that

(3.4) $$ \begin{align}k_{1}=\cdots=k_{j_1}, k_{j_1+1}=\cdots=k_{j_2},\cdots,k_{j_{r-1}+1}=\cdots=k_{j_r},\quad 1\leq j_1<\cdots <j_r=n, \end{align} $$

and the $k_{j_s}$ are different for $1\leq s\leq r$ . Such a monomial has the form

$$ \begin{align*}\prod_{s=1}^r\eta_{k_{j_s}}^{b_s}\left(\overline{\eta_{k_{j_s}}}\right)^{c_s},\qquad b_s+c_s=j_s-j_{s-1}\ (j_0=0),\end{align*} $$

where the factors for different s are independent. We may also assume $b_s=c_s$ for $1\leq s\leq q$ and $b_s\neq c_s$ for $q+1\leq s\leq r$ , and for $1\leq j\leq j_q$ we may assume $\iota _j$ has the same sign as $(-1)^j$ . Then we can further rewrite this monomial as a linear combination of

$$ \begin{align*}\prod_{s=1}^pb_s!\prod_{s=p+1}^q\left(\left\lvert\eta_{k_{j_s}}\right\rvert^{2b_s}-b_s!\right)\prod_{s=q+1}^r\eta_{k_{j_s}}^{b_s}\left(\overline{\eta_{k_{j_s}}}\right)^{c_s}\end{align*} $$

for $1\leq p\leq q$ . Therefore, $F(\omega )$ is a finite linear combination of expressions of the form

$$ \begin{align*}\sum_{k_{j_1},\ldots,k_{j_r}}a_{k_{j_1},\ldots,k_{j_1},\ldots,k_{j_r},\ldots k_{j_r}}\prod_{s=1}^pb_s!\prod_{s=p+1}^q\left(\left\lvert\eta_{k_{j_s}}\right\rvert^{2b_s}-b_s!\right)\prod_{s=q+1}^r\eta_{k_{j_s}}^{b_s}\left(\overline{\eta_{k_{j_s}}}\right)^{c_s}.\end{align*} $$

Due to independence and the fact that $\mathbb {E}\left (\left \lvert \eta \right \rvert ^{2b}-b!\right )=\mathbb {E}\left (\eta ^b\left (\overline {\eta }\right )^c\right )=0$ for a normalised Gaussian $\eta $ and $b\neq c$ , we conclude that

(3.5) $$ \begin{align}\mathbb{E}\left\lvert F(\omega)\right\rvert^2\lesssim\sum_{k_{j_{p+1}},\ldots, k_{j_r}}\left(\sum_{k_{j_1},\ldots,k_{j_p}}\left\lvert a_{k_{j_1},\ldots,k_{j_1},\ldots,k_{j_r},\ldots k_{j_r}}\right\rvert\right)^2, \end{align} $$

which is bounded by the right-hand side of equation (3.3), by choosing $X=\left \{1,3,\ldots ,j_p-1\right \}$ and $Y=\left \{2,4,\ldots ,j_p\right \}$ , as under our assumptions $(k_{2i-1},k_{2i})$ is a pairing for $2i\leq j_p$ .

Now assume $\eta _k$ is uniformly distributed on the unit circle. Let $\{g_k(\omega )\}$ be independent, identically distributed normalised Gaussians as in the first part, and consider the random variable

$$ \begin{align*}H(\omega)=\sum_{k_1,\ldots,k_n}\left\lvert a_{k_1\cdots k_n}\right\rvert\prod_{j=1}^ng_{k_j}^{\iota_j}.\end{align*} $$

We can calculate

(3.6) $$ \begin{align}\mathbb{E}\left(\left\lvert F(\omega)\right\rvert^{2q}\right)=\sum_{\left(k_j^i,\ell_j^i\right)}\prod_{i=1}^qa_{k_1^i\cdots k_n^i}\overline{a_{\ell_1^i\cdots\ell_n^i}}\mathbb{E}\left(\prod_{i=1}^q\prod_{j=1}^n\eta_{k_j^i}^{\iota_j}\overline{\eta_{\ell_j^i}^{\iota_j}}\right), \end{align} $$

where $1\leq i\leq q$ and $1\leq j\leq n$ , and similarly for H,

(3.7) $$ \begin{align}\mathbb{E}\left(\left\lvert H(\omega)\right\rvert^{2q}\right)=\sum_{\left(k_j^i,\ell_j^i\right)}\prod_{i=1}^q\left\lvert a_{k_1^i\cdots k_n^i}\right\rvert\left\lvert a_{\ell_1^i\cdots\ell_n^i}\right\rvert\mathbb{E}\left(\prod_{i=1}^q\prod_{j=1}^ng_{k_j^i}^{\iota_j}\overline{g_{\ell_j^i}^{\iota_j}}\right). \end{align} $$

The point is that we always have

$$ \begin{align*}\left\lvert\mathbb{E}\left(\prod_{i=1}^q\prod_{j=1}^r\eta_{k_j^i}^{\iota_j}\overline{\eta_{\ell_j^i}^{\iota_j}}\right)\right\rvert\leq \mathrm{Re}\mathbb{E}\left(\prod_{i=1}^q\prod_{j=1}^rg_{k_j^i}^{\iota_j}\overline{g_{\ell_j^i}^{\iota_j}}\right).\end{align*} $$

In fact, in order for either side to be nonzero, for any particular k we must have

$$ \begin{gather*}\#\left\{(i,j):k_j^i=k,\iota_j=+\right\}+\#\left\{(i,j):\ell_j^i=k,\iota_j=-\right\}\\=\#\left\{(i,j):k_j^i=k,\iota_j=-\right\}+\#\left\{(i,j):\ell_j^i=k,\iota_j=+\right\}. \end{gather*} $$

Let both be equal to m; then by independence, the factor that the $\eta _k^{\pm }$ s contribute to the expectation on the left-hand side will be $\mathbb {E}\left \lvert \eta _k\right \rvert ^{2m}=1$ , while for the right-hand side it will be $\mathbb {E}\left \lvert g_k\right \rvert ^{2m}=m!\geq 1$ .

This implies that $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )\leq \mathbb {E}\left (\left \lvert H\right \rvert ^{2q}\right )$ for any positive integer q; since formula (3.2) holds for H, we have

$$ \begin{align*}\mathbb{E}\left(\left\lvert H\right\rvert^{2q}\right)\leq (Cq)^{nq}M^q\end{align*} $$

with an absolute constant C. This gives an upper bound for $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )$ , and by Chebyshev inequality, we deduce formula (3.2) for F.

Lemma 3.2. Let $\beta =(\beta _1,\ldots ,\beta _d)\in [1,2]^d$ and $0<T\leq L^d$ . Assume that $\beta $ is generic for $T\geq L^{2}$ . Then, uniformly in $(k,a,b,c)\in \left (\mathbb {Z}_L^d\right )^4$ and $m\in \mathbb {R}$ , the sets

(3.8) $$ \begin{align} S_3=&\bigg\{(x,y,z)\in \left(\mathbb{Z}_L^d\right)^3:x-y+z=k,\ \left\lvert\left\lvert x\right\rvert_{\beta}^2-\left\lvert y\right\rvert_{\beta}^2+\left\lvert z\right\rvert_{\beta}^2-\left\lvert k\right\rvert_{\beta}^2-m\right\rvert\leq T^{-1},\nonumber \\ &\quad\quad\quad\quad\quad \left\lvert x-a\right\rvert\leq L^{\theta},\ \left\lvert y-b\right\rvert\leq L^{\theta},\ \left\lvert z-c\right\rvert\leq L^{\theta} \text{ and }k\not\in\{x,z\}\bigg\}, \end{align} $$
(3.9) $$ \begin{align} S_2&=\bigg\{(x,y)\in\left(\mathbb{Z}_L^d\right)^3:x\pm y=k,\ \left\lvert\left\lvert x\right\rvert_{\beta}^2\pm\left\lvert y\right\rvert_{\beta}^2-\left\lvert k\right\rvert_{\beta}^2-m\right\rvert\leq T^{-1},\nonumber \\ &\quad\quad\quad\quad\quad\left\lvert x-a\right\rvert\leq L^{\theta},\ \left\lvert y-b\right\rvert\leq L^{\theta} \text{ and }x\neq y\text{ if the sign }\pm\text{ is }-\bigg\} \end{align} $$

satisfy the bounds

(3.10) $$ \begin{align}\#S_3\lesssim L^{2d+\theta}T^{-1},\qquad \#S_2\lesssim \begin{cases}L^{d+\theta}&\text{if }T\leq L,\\ L^{d+1+\theta}T^{-1}&\text{if }T\in\left[L,L^2\right],\\ L^{d-1+\theta}&\text{if }T\geq L^{2}\text{ and }\beta_i\text{ is generic}, \end{cases} \end{align} $$

where in the first inequality of formula (3.10) we also assume $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ .

Moreover, with $\rho $ defined as in formula (2.24), we have

(3.11) $$ \begin{align}\max\left((\#S_3)^{\frac{1}{2}},\#S_2\right)\leq L^{\theta} Q,\qquad Q:=\frac{L^d\rho}{\alpha T}, \end{align} $$

without any assumption on $(k,a,b)$ .

Proof. We first consider $S_3$ . Let $k-x=p$ and $k-z=q$ ; then we may write $p=\left (L^{-1}u_1,\ldots , L^{-1}u_d\right )$ and similarly for q, where each $u_i$ and $v_i$ is an integer and belongs to a fixed interval of length $O\left (L^{1+\theta }\right )$ . Moreover, from $(x,y,z)\in S_3$ we deduce that

$$ \begin{align*}\left|\sum_{i=1}^d\beta_iu_iv_i+\frac{L^2m}{2}\right|\leq\frac{L^2T^{-1}}{2}.\end{align*} $$

We may assume $u_iv_i=0$ for $1\leq i\leq r$ , and $\sigma _i:=u_iv_i\neq 0$ for $r+1\leq i\leq d$ ; then the number of choices for $(u_i,v_i:1\leq i\leq r)$ is $O\left (L^{r+\theta }\right )$ . It is known (see [Reference Deng, Nahmod and Yue17Reference Deng, Nahmod and Yue18]) that given $\sigma \neq 0$ , the number of integer pairs $(u,v)$ such that u and v each belongs to an interval of length $O\left (L^{1+\theta }\right )$ and $uv=\sigma $ is $O\left (L^{\theta }\right )$ . Therefore, if $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ , then $\#S_3$ is bounded by $O\left (L^{r+\theta }\right )$ times the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ that satisfy

(3.12) $$ \begin{align}\left\lvert\sigma_j\right\rvert\leq L^{2+\theta}\ (r+1\leq j\leq d),\qquad \sum_{j=r+1}^d\beta_i\sigma_i=-\frac{L^2m}{2}+O\left(L^2T^{-1}\right).\end{align} $$

Using the assumption $T\leq L^{d}$ , it suffices to show that the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ satisfying formula (3.12) is at most $O\left (1+L^{2(d-r)+\theta }T^{-1}\right )$ . This latter bound is trivial if $d-r=1$ or $L^2T^{-1}\geq 1$ , so we may assume $d-r\geq 2$ , $T\geq L^{2}$ and $\beta _i$ is generic. It is well known in Diophantine approximation (see, e.g., [Reference Cassels9]) that for generic $\beta _i$ we have

$$ \begin{align*}\left\lvert\sum_{i=r+1}^d\beta_i\eta_i\right\rvert\gtrsim\left(\max_{r+1\leq i\leq d}\left\langle\eta_i\right\rangle\right)^{-(d-r-1)-\theta}\quad\text{if }\eta_i\text{ are not all }0,\end{align*} $$

so the distance between any two points $(\sigma _i:r+1\leq i\leq d)$ and $(\sigma _i':r+1\leq i\leq d)$ satisfying formula (3.12) is at least $\left (L^2T^{-1}\right )^{-\frac {1}{d-r-1}-\theta }$ . Since all these points belong to a box which has size $O(1)$ in one direction and size $O\left (L^{2+\theta }\right )$ in other orthogonal directions, we deduce that the number of solutions to formula (3.12) is at most $1+L^{\theta } L^{2(d-r-1)}L^2T^{-1}$ , as desired.

Next, without any assumption on $(k,a,b)$ , we need to prove formula (3.11). By definition (2.24) we can check that $Q^2\geq L^{2d}\left (\min \left (T,L^2\right )\right )^{-1}$ , so it suffices to prove the first inequality of formula (3.10), assuming $T\leq L^2$ . But this again follows from formula (3.12), noting that now $\left \lvert \sigma _j\right \rvert \leq L^{2+\theta }$ is no longer true, but each $\sigma _j$ still has at most $L^{2+\theta }$ possible values.

Finally we consider $S_2$ , which is much easier. In fact, formula (3.11) follows from formula (3.10), so we only need to prove the latter. Now if $T\leq L$ , we trivially have $\#S_2\leq L^{d+\theta }$ , as y will be fixed once x is; if $T\geq L$ , then we may assume $x_d-y_d\neq 0$ if the sign $\pm $ is $-$ , and then fix the first coordinates $x_j (1\leq j\leq d-1)$ and hence $y_j (1\leq j\leq d-1)$ . Then we have that $x_d\pm y_d$ is fixed, and $x_d^2\pm y_d^2$ belongs to a fixed interval of length $O\left (T^{-1}\right )$ . Since $x_d,y_d\in L^{-1}\mathbb {Z}$ , we know that $x_d$ has at most $1+L^2T^{-1}$ choices, which implies what we want to prove.

3.2 Bounds for $\mathcal {J}_n$

In this section we prove Proposition 2.5. We will need to extend the notion of ternary trees to paired, coloured ternary trees:

Definition 3.3 Tree pairings and colourings

Let $\mathcal{T}\,$ be a ternary tree as in Definition 2.2. We will pair some of the leaves of $\mathcal{T}\,$ such that each leaf belongs to at most one pair. The two leaves in a pair are called partners of each other, and the unpaired leaves are called single. We assume $\iota _{\mathfrak {l}}+\iota _{\mathfrak {l}'}=0$ for any pair $(\mathfrak {l},\mathfrak {l}')$ . The set of single leaves is denoted $\mathcal {S}$ . The number of pairs is denoted by p, so that $\lvert {\mathcal S}\rvert =l-2p$ . Moreover, we assume that some nodes in $\mathcal {S}\cup \{\mathfrak {r}\}$ are coloured red, and let $\mathcal R$ be the set of red nodes. We shall denote $r=\lvert \mathcal R\rvert $ .

We shall use red colouring to denote that the frequency assignments to the corresponding red vertex are fixed in the counting process. We also introduce the following definition:

Definition 3.4 Strong admissibility

Suppose we fix $n_{\mathfrak {m}}\in \mathbb {Z}_L^d$ for each $\mathfrak {m}\in \mathcal R$ . An assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is called strongly admissible with respect to the given pairing, colouring and $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ if it is admissible in the sense of Definition 2.2, and

(3.13) $$ \begin{align}k_{\mathfrak{m}}=n_{\mathfrak{m}}\ \forall \mathfrak{m}\in\mathcal R,\qquad \lvert k_{\mathfrak{l}}\rvert\leq L^{\theta}\ \forall \mathfrak{l}\in\mathcal L,\qquad k_{\mathfrak{l}}=k_{\mathfrak{l}'}\ \forall \text{ pairs of leaves } (\mathfrak{l},\mathfrak{l}'). \end{align} $$

The key to the proof of Proposition 2.5 is the following combinatorial counting bound:

Proposition 3.5. Let $\mathcal{T}\,$ be a paired and coloured ternary tree such that $\mathcal R\neq \varnothing $ , and let $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ be fixed. We also fix $\sigma _{\mathfrak {n}}\in \mathbb {R}$ for each $\mathfrak {n}\in \mathcal {N}$ . Let $l=\lvert \mathcal L\rvert $ be the total number of leaves, p be the number of pairs and $r=\lvert \mathcal R\rvert $ be the number of red nodes. Then the number of strongly admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ which also satisfy

(3.14) $$ \begin{align}\lvert\Omega_{\mathfrak{n}}-\sigma_{\mathfrak{n}}\rvert\leq T^{-1}\ \forall \mathfrak{n}\in\mathcal{N} \end{align} $$

is – recalling Q defined in formula (3.11) – bounded by

(3.15) $$ \begin{align}M\leq \begin{cases} L^{\theta} Q^{l-p-r}&\text{if }\mathcal R\neq\mathcal{S}\cup\{\mathfrak{r}\},\\ L^{\theta} Q^{l-p-r+1}&\text{if } \mathcal R=\mathcal{S}\cup\{\mathfrak{r}\}. \end{cases} \end{align} $$

Proof. We proceed by induction. The base cases directly follow from formula (3.11). Now suppose the desired bound holds for all smaller trees, and consider $\mathcal{T}\,$ . Let $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ be the children of the root node $\mathfrak {r}$ and $\mathcal{T}\,_j$ be the subtree rooted at $\mathfrak {r}_j$ . Let $l_j$ be the number of leaves in $\mathcal{T}\,_j$ , $p_j$ the number of pairs within $\mathcal{T}\,_j$ and $p_{ij}$ the number of pairings between $\mathcal{T}\,_i$ and $\mathcal{T}\,_j$ , and let $r_j=\left \lvert \mathcal {R}\cap \mathcal{T}\,_j\right \rvert $ ; then we have

$$ \begin{align*}l=l_1+l_2+l_3,\qquad p=p_1+p_2+p_3+p_{12}+p_{13}+p_{23},\qquad r=r_1+r_2+r_3+\mathbf{1}_{\mathfrak{r}\in\mathcal R}.\end{align*} $$

Also note that $\lvert k_{\mathfrak {n}}\rvert \lesssim L^{\theta }$ for all $\mathfrak {n}\in \mathcal{T}\,$ .

The proof will be completely algorithmic, with the discussion of a lot of cases. The general strategy is to perform the following four operations, which we refer to as $\mathcal {O}_j (0\leq j\leq 3)$ , in a suitable order. Here in operation $\mathcal {O}_0$ we apply formula (3.11) to count the number of choices for the values among $\left \{k_{\mathfrak {r}},k_{\mathfrak {r}_1},k_{\mathfrak {r}_2},k_{\mathfrak {r}_2}\right \}$ that are not already fixed (this step may be trivial if three of these four vectors are already fixed –i.e., coloured – or if one of them is already fixed and $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ ). In operations $\mathcal {O}_j (1\leq j\leq 3)$ , we apply the induction hypothesis to one of the subtrees $\mathcal{T}\,_j$ and count the number of choices for $\left (k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,_j\right )$ . Let the number of choices associated with $\mathcal {O}_j (0\leq j\leq 3)$ be $M_j$ , with superscripts indicating different cases. In the whole process we may colour new nodes $\mathfrak {n}$ red if $k_{\mathfrak {n}}$ is already fixed during the previous operations, namely when $\mathfrak {n}=\mathfrak {r}$ and we have performed $\mathcal {O}_0$ before, when $\mathfrak {n}=\mathfrak {r}_j$ and we have performed $\mathcal {O}_0$ or $\mathcal {O}_j$ before or when $\mathfrak {n}$ is a leaf that has a partner in $\mathcal{T}\,_j$ and we have performed $\mathcal {O}_j$ before.

(1) Suppose $\mathfrak {r}\not \in \mathcal R$ ; then we may assume that there is a red leaf from $\mathcal{T}\,_1$ .Footnote 11 We first perform $\mathcal {O}_1$ and get a factor

$$ \begin{align*}M_1^{(1)}:= L^{\theta} Q^{l_1-p_1-r_1}.\end{align*} $$

Now $\mathfrak {r}_1$ is coloured red, as is any leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_1$ . There are then two cases.

(1.1) Suppose now there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is red. Then we perform $\mathcal {O}_2$ and get a factor

$$ \begin{align*}M_2^{(1.1)}:=L^{\theta} Q^{l_2-p_2-r_2-p_{12}}.\end{align*} $$

Now $\mathfrak {r}_2$ is coloured red, as is any leaf of $\mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are again two cases.

(1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor

$$ \begin{align*}M_3^{(1.1.1)}:=L^{\theta} Q^{l_3-p_3-r_3-p_{13}-p_{23}},\end{align*} $$

then colour $\mathfrak {r}_3$ red and apply $\mathcal {O}_0$ to get a factor $M_0^{(1.1.1)}:=1$ . Thus

$$ \begin{align*}M\leq M_1^{(1)}M_2^{(1.1)}M_3^{(1.1.1)}M_0^{(1.1.1)}=L^{l-p-r+\theta},\end{align*} $$

which is what we need.

(1.1.2) Suppose after step (1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{13}=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(1.1.2)}:=L^{\theta } Q^{1}$ (perhaps with slightly enlarged $\theta $ ; the same applies later). Now we may colour $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor

$$ \begin{align*}M_3^{(1.1.2)}:=L^{\theta} Q^{l_3-p_3-1}.\end{align*} $$

Thus

$$ \begin{align*}M\leq M_1^{(1)}M_2^{(1.1)}M_0^{(1.1.2)}M_3^{(1.1.2)}=L^{\theta} Q^{l-p-r},\end{align*} $$

which is what we need.

(1.2) Now suppose that after step (1) there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=p_{12}=p_{13}=0$ . There are two cases.

(1.2.1) Suppose there is a single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ . Then we will perform $\mathcal {O}_0$ and get a factor $M_0^{(1.2.1)}:=L^{\theta } Q^{2}$ . Now we may colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor

$$ \begin{align*}M_3^{(1.2.1)}:=L^{\theta} Q^{l_3-p_3-1}.\end{align*} $$

Now any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ is coloured red, so we may perform $\mathcal {O}_2$ and get a factor

$$ \begin{align*}M_2^{(1.2.1)}:=L^{\theta} Q^{l_2-p_2-p_{23}-1}.\end{align*} $$

Thus

$$ \begin{align*}M\leq M_1^{(1)}M_0^{(1.2.1)}M_3^{(1.2.1)}M_2^{(1.2.1)}=L^{\theta} Q^{l-p-r},\end{align*} $$

which is what we need.

(1.2.2) Suppose there is no single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then all leaves in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ are paired to one another, which implies that $k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ and that $\mathfrak {r}_2$ and $\mathfrak {r}_3$ have opposite signs, and hence by the admissibility condition we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . This allows us to perform $\mathcal {O}_0$ and colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red with $M_0^{(1.2.2)}:=1$ , then perform $\mathcal {O}_3$ and colour red any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , then perform $\mathcal {O}_2$ (for which we use the second bound in formula (3.15)). This leads to the factors

$$ \begin{align*}M_3^{(1.2.2)}:=L^{\theta} Q^{l_3-p_3-1},\qquad M_2^{(1.2.2)}\leq L^{\theta} Q^{l_2-p_2-p_{23}-1+1},\end{align*} $$

and thus

$$ \begin{align*}M\leq M_1^{(1)}M_0^{(1,2,2)}M_3^{(1.2.2)}M_2^{(1.2.2)}=L^{\theta} Q^{l-p-r-1},\end{align*} $$

which is better than what we need.

(2) Now suppose $\mathfrak {r}\in \mathcal R$ ; then $r=r_1+r_2+r_3+1$ . There are two cases.

(2.1) Suppose there is one single leaf that is not red, say from $\mathcal{T}\,_1$ . There are again two cases.

(2.1.1) Suppose there is a red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say $\mathcal{T}\,_2$ . Then we perform $\mathcal {O}_2$ and get a factor

$$ \begin{align*}M_2^{(2.1.1)}:=L^{\theta} Q^{l_2-p_2-r_2}.\end{align*} $$

We now colour red $\mathfrak {r}_2$ and any leaf in $\mathcal{T}\,_1\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are a further two cases.

(2.1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor

$$ \begin{align*}M_3^{(2.1.1.1)}:= L^{\theta} Q^{l_3-p_3-r_3-p_{23}}.\end{align*} $$

Now we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.1)}:=1$ , then colour red $\mathfrak {r}_1$ as well as any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor

$$ \begin{align*}M_1^{(2.1.1.1)}:=L^{\theta} Q^{l_1-p_1-r_1-p_{12}-p_{13}-1}.\end{align*} $$

Thus

$$ \begin{align*}M\leq M_2^{(2.1.1)}M_3^{(2.1.1.1)}M_0^{(2.1.1.1)}M_{1}^{(2.1.1.1)}=L^{\theta} Q^{l-p-r},\end{align*} $$

which is what we need.

(2.1.1.2) Suppose after step (2.1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.2)}:=L^{\theta } Q^{1}$ . Then we colour $\mathfrak {r}_1$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor

$$ \begin{align*}M_3^{(2.1.1.2)}:=L^{\theta} Q^{l_3-p_3-1}.\end{align*} $$

Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor

$$ \begin{align*}M_1^{(2.1.1.2)}:=L^{\theta} Q^{l_1-p_1-r_1-p_{12}-p_{13}-1}.\end{align*} $$

Thus

$$ \begin{align*}M\leq M_2^{(2.1.1)}M_0^{(2.1.1.2)}M_3^{(2.1.1.2)}M_{1}^{(2.1.1.2)}=L^{\theta} Q^{l-p-r},\end{align*} $$

which is what we need.

(2.1.2) Suppose in the beginning there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=0$ . There are again two cases.

(2.1.2.1) Suppose there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is either single or paired with a leaf in $\mathcal{T}\,_1$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.1)}:=L^{\theta } Q^{2}$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor

$$ \begin{align*}M_3^{(2.1.2.1)}:= L^{\theta} Q^{l_3-p_3-1}.\end{align*} $$

We then colour red any leaf of $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_2$ to get a factor

$$ \begin{align*}M_2^{(2.1.2.1)}:= L^{\theta} Q^{l_2-p_2-p_{23}-1}.\end{align*} $$

Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_2$ , and perform $\mathcal {O}_1$ to get a factor

$$ \begin{align*}M_{1}^{(2.1.2.1)}:= L^{\theta} Q^{l_1-p_1-r_1-p_{12}-p_{13}-1}.\end{align*} $$

Thus

$$ \begin{align*}M\leq M_0^{(2.1.2.1)}M_3^{(2.1.2.1)}M_2^{(2.1.2.1)}M_{1}^{(2.1.2.1)}=L^{\theta} Q^{l-p-r},\end{align*} $$

which is what we need.

(2.1.2.2) Suppose there is no leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ that is either single or paired with a leaf in $\mathcal{T}\,_1$ ; then in the same way as in case (1.2.2), we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . Moreover, we have $p_{12}=p_{13}=0$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.2)}:=1$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor

$$ \begin{align*}M_3^{(2.1.2.2)}:= L^{\theta} Q^{l_3-p_3-1}.\end{align*} $$

We then colour red any leaf of