1 Introduction
The kinetic framework is a general paradigm that aims to extend Boltzmann’s kinetic theory for dilute gases to other types of microscopic interacting systems. This approach has been highly informative, and became a cornerstone of the theory of nonequilibrium statistical mechanics for a large body of systems [Reference Spohn43, Reference Spohn44]. In the context of nonlinear dispersive waves, this framework was initiated in the first half of the past century [Reference Peierls41] and developed into what is now called wave turbulence theory [Reference Zakharov, L’vov and Falkovich51, Reference Nazarenko39]. There, waves of different frequencies interact nonlinearly at the microscopic level, and the goal is to extract an effective macroscopic picture of how the energy densities of the system evolve.
The description of such an effective evolution comes via the wave kinetic equation (WKE), which is the analogue of Boltzmann’s equation for nonlinear wave systems [Reference Spohn46]. Such kinetic equations have been derived at a formal level for many systems of physical interest (nonlinear Schrödinger (NLS) and nonlinear wave (NLW) equations, water waves, plasma models, lattice crystal dynamics, etc.; compare [Reference Nazarenko39] for a textbook treatment) and are used extensively in applications (thermal conductivity in crystals [Reference Spohn45], ocean forecasting [Reference Janssen31, Reference Burns49], and more). This kinetic description is conjectured to appear in the limit where the number of (locally interacting) waves goes to infinity and an appropriate measure of the interaction strength goes to zero (weak nonlinearityFootnote ^{1} ). In such kinetic limits, the total energy of the whole system often diverges.
The fundamental mathematical question here, which also has direct consequences for the physical theory, is to provide a rigorous justification of such wave kinetic equations starting from the microscopic dynamics given by the nonlinear dispersive model at hand. The importance of such an endeavour stems from the fact that it allows an understanding of the exact regimes and the limitations of the kinetic theory, which has long been a matter of scientific interest (see [Reference Denissenko, Lukaschuk and Nazarenko20, Reference Aubourg, Campagne, Peureux, Ardhuin, Sommeria, Viboud and Mordant1]). A few mathematical investigations have recently been devoted to studying problems in this spirit [Reference Faou23, Reference Buckmaster, Germain, Hani and Shatah7, Reference Lukkarinen and Spohn35], yielding some partial results and useful insights.
This manuscript continues the investigation initiated in [Reference Buckmaster, Germain, Hani and Shatah7], aimed at providing a rigorous justification of the wave kinetic equation corresponding to the nonlinear Schrödinger equation,
As we shall explain later, the sign of the nonlinearity has no effect on the kinetic description, so we choose the defocussing sign for concreteness. The natural setup for the problem is to start with a spatial domain given by a torus ${\mathbb T}^d_L$ of size L, which approaches infinity in the thermodynamic limit we seek. This torus can be rational or irrational, which amounts to rescaling the Laplacian into
and taking the spatial domain to be the standard torus of size L, namely ${\mathbb T}^d_L=[0,L]^d$ with periodic boundary conditions. With this normalisation, an irrational torus would correspond to taking the $\beta _j$ to be rationally independent. Our results cover both cases, and in part of them $\beta $ is assumed to be generic – that is, avoiding a set of Lebesgue measure $0$ .
The strength of the nonlinearity is related to the characteristic size $\lambda $ of the initial data (say in the conserved $L^2$ space). Adopting the ansatz $v=\lambda u$ , we arrive at the following equation:
The kinetic description of the longtime behaviour is akin to a law of large numbers, and therefore one has to start with a random distribution of the initial data. Heuristically, a randomly distributed, $L^{2}$ normalised field would (with high probability) have a roughly uniform spatial distribution, and consequently an $L_x^{\infty }$ norm $\sim L^{d/2}$ . This makes the strength of the nonlinearity in (NLS) comparable to $\lambda ^2 L^{d}$ (at least initiallyFootnote ^{2} ), which motivates us to introduce the quantity
and phrase the results in terms of $\alpha $ instead of $\lambda $ . The kinetic conjecture states that at sufficiently long time scales, the effective dynamics of the Fourierspace mass density $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2 \left (k \in \mathbb Z^d_L=L^{1}\mathbb Z^d\right )$ is well approximated – in the limit of large L and vanishing $\alpha $ – by an appropriately scaled solution $n(t, \xi )$ of the following WKE:
where we used the shorthand notations $\phi _j:=\phi \left (\xi _j\right )$ and $\left \lvert \xi \right \rvert ^2_{\beta }=\sum _{j=1}^d \beta _j \left (\xi ^{\left (j\right )}\right )^2$ for $\xi =\left (\xi ^{(1)},\cdots ,\xi ^{(d)}\right )$ . More precisely, one expects this approximation to hold at the kinetic timescale $T_{\mathrm {kin}}\sim \alpha ^{2}=\frac {L^{2d}}{\lambda ^4}$ , in the sense that
Of course, for such an approximation to hold at time $t=0$ , one has to start with a wellprepared initial distribution for $\widehat u_{\text {in}}(k)$ as follows: denoting by $n_{\text {in}}$ the initial data for (WKE), we assume
where $\eta _{k}(\omega )$ are mean $0$ complex random variables satisfying $\mathbb E \left \lvert \eta _k\right \rvert ^2=1$ . In what follows, $\eta _k(\omega )$ will be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either the normalised complex Gaussian or the uniform distribution on the unit circle $\lvert z\rvert =1$ .
Before stating our results, it is worth remarking on the regime of data and solutions covered by this kinetic picture in comparison to previously studied and wellunderstood regimes in the nonlinear dispersive literature. For this, let us look back at the (preansatz) NLS solution v, whose conserved energy is given by
We are dealing with solutions having an $L^{\infty }$ norm of $O\left (\sqrt \alpha \right )$ (with high probability) and whose total mass is $O\left (\alpha L^d\right )$ , in a regime where $\alpha $ is vanishingly small and L is asymptotically large. These bounds on the solutions are true initially, as we have already explained, and will be propagated in our proof. In particular, the mass and energy are very large and will diverge in this kinetic limit, as is common in taking thermodynamic limits [Reference Ruelle42, Reference Minlos37]. Moreover, the potential part of the energy is dominated by the kinetic part – the former of size $O\left (\alpha ^3 L^d\right )$ and the latter of size $O\left (\alpha L^d\right )$ – which explains why there is no distinction between the defocussing and focussing nonlinearities in the kinetic limit. It would be interesting to see how the kinetic framework can be extended to regimes of solutions which are sensitive to the sign of the nonlinearity; this has been investigated in the physics literature (e.g., [Reference Dyachenko, Newell, Pushkarev and Zakharov22, Reference Fitzmaurice, Gurarie, McCaughan and Woyczynski25, Reference Zakharov, Korotkevich, Pushkarev and Resio50]).
1.1 Statement of the results
It is not a priori clear how the limits $L\to \infty $ and $\alpha \to 0$ need to be taken for formula (1.1) to hold or whether there is an additional scaling law (between $\alpha $ and L) that needs to be satisfied in the limit. In comparison, such scaling laws are imposed in the rigorous derivation of Boltzmann’s equation [Reference Lanford34, Reference Cercignani, Illner and Pulvirenti10, Reference Gallagher, SaintRaymond and Texier26], which is derived in the socalled Boltzmann–Grad limit [Reference Grad27]: namely, the number N of particles goes to $\infty $ while their radius r goes to $0$ in such a way that $Nr^{d1}\sim O(1)$ . To the best of our knowledge, this central point has not been adequately addressed in the waveturbulence literature.
Our results seem to suggest some key differences depending on the chosen scaling law. Roughly speaking, we identify two special scaling laws for which we are able to justify the approximation (1.1) up to time scales $L^{\varepsilon } T_{\text {kin}}$ for any arbitrarily small $\varepsilon>0$ . For other scaling laws, we identify significant absolute divergences in the powerseries expansion for $\mathbb E \left \lvert \widehat u(t, k)\right \rvert ^2$ at much earlier times. We can therefore only justify this approximation at such shorter times (which are still better than those in [Reference Buckmaster, Germain, Hani and Shatah7]). In these cases, whether or not formula (1.1) holds up to time scales $L^{\varepsilon } T_{\text {kin}}$ depends on whether such series converge conditionally instead of absolutely, and thus would require new methods and ideas, as we explain later.
We start by identifying the two favourable scaling laws. We use the notation $\sigma +$ for any numerical constant $\sigma $ (e.g., $\sigma =\varepsilon $ or $\sigma =1\frac {\varepsilon }{2}$ , where $\varepsilon $ is as in Theorem 1.1) to denote a constant that is strictly larger than and sufficiently close to $\sigma $ .
Theorem 1.1. Set $d\geq 2$ and let $\beta \in [1,2]^d$ be arbitrary. Suppose that $n_{\mathrm {in}} \in {\mathcal S}\left ({\mathbb R}^d \to [0, \infty )\right )$ is SchwartzFootnote ^{3} and $\eta _{k}(\omega )$ are independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either complex Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle $\lvert z\rvert =1$ . Assume wellprepared initial data $u_{\mathrm {in}}$ for (NLS) as in equation (1.2).
Fix $0<\varepsilon <1$ (in most interesting cases $\varepsilon $ will be small); recall that $\lambda $ and L are the parameters in (NLS) and let $\alpha =\lambda ^2L^{d}$ be the characteristic strength of the nonlinearity. If $\alpha $ has the scaling law $\alpha \sim L^{(\varepsilon )+}$ or $\alpha \sim L^{\left (1\frac {\varepsilon }{2}\right )+}$ , then we have
for all $L^{0+} \leq t \leq L^{\varepsilon } T_{\mathrm {kin}}$ , where $T_{\mathrm {kin}}=\alpha ^{2}/2$ , ${\mathcal K}$ is defined in (WKE) and $o_{\ell ^{\infty }_k}\left (\frac {t}{T_{\mathrm {kin}}}\right )_{L \to \infty }$ is a quantity that is bounded in $\ell ^{\infty }_k$ by $L^{\theta } \frac {t}{T_{\mathrm {kin}}}$ for some $\theta>0$ .
We remark that in the time interval of the approximation we have been discussing, the right hand sides of formulas (1.1) and (1.3) are equivalent. Also note that any type of scaling law of the form $\alpha \sim L^{s}$ gives an upper bound of $t\leq L^{\varepsilon }T_{\mathrm {kin}}\sim L^{2s\varepsilon }$ for the times considered. Consequently, for the two scaling laws in Theorem 1.1, the time t always satisfies $t\ll L^{2}$ , and it is for this reason that the rationality type of the torus is not relevant. As will be clear later, no similar results can hold for $t\gg L^2$ in the case of a rational torus,Footnote ^{4} as this would require rational quadratic forms to be equidistributed on scales $\ll 1$ , which is impossible. However, if the aspect ratios $\beta $ are assumed to be generically irrational, then one can access equidistribution scales that are as small as $L^{d+1}$ for the resulting irrational quadratic forms [Reference Bourgain4, Reference Buckmaster, Germain, Hani and Shatah7]. This allows us to consider scaling laws for which $T_{\mathrm {kin}}$ can be as big as $L^{d}$ on generically irrational tori.
Remark 1.2. Strictly speaking, in evaluating equation (1.3) one has to first ensure the existence of the solution u. This is guaranteed if $d\in \{2,3,4\}$ (when (NLS) is $H^1$ critical or subcritical). When $d\geq 5$ we shall interpret equation (1.3) such that the expectation is taken only when the longtime smooth solution u exists. Moreover, from our proof it follows that the probability that this existence fails is at most $O\left (e^{L^{\theta }}\right )$ , which quickly becomes negligible when $L\to \infty $ .
The following theorem covers general scaling laws, including the ones that can only be accessed for the generically irrational torus. By a simple calculation of exponents, we can see that it implies Theorem 1.1.
Theorem 1.3. With the same assumptions as in Theorem 1.1, we impose the following conditions on $(\alpha , L, T)$ for some $\delta>0$ :
Then formula (1.3) holds for all $L^{\delta } \leq t \leq T$ .
It is best to read this theorem in terms of the $\left (\log _L \left (\alpha ^{1}\right ),\log _L T\right )$ plot in Figure 1. The kinetic conjecture corresponds to justifying the approximation in formula (1.1) up to time scales $T\lesssim T_{\mathrm {kin}}=\alpha ^{2}$ . As we shall explain later, the time scale $T\sim T_{\mathrm {kin}}$ represents a critical scale for the problem from a probabilistic point of view. This is depicted by the red line in the figure, and the region below this line corresponds to a probabilistically subcritical regime (see Section 1.2.1). The shaded blue region corresponds to the $(\alpha , T)$ region in Theorem 1.3, neglecting $\delta $ losses. This region touches the line $T=\alpha ^{2}$ at the two points corresponding to $\left (\alpha ^{1}, T\right )=(1, 1)$ and $\left (L, L^2\right )$ , whereas the two scaling laws of Theorem 1.1, where $\left (\alpha ^{1},T\right )\sim (L^{\varepsilon },L^{\varepsilon })$ and $\left (\alpha ^{1},T\right )\sim \left (L^{1+\frac {\varepsilon }{2}},L^{2}\right )$ , approach these two points when $\varepsilon $ is small.
These results rely on a diagrammatic expansion of the NLS solution in Feynman diagrams akin to a Taylor expansion. The shaded blue region depicting the result of Theorem 1.3 corresponds to the cases when such a diagrammatic expansion is absolutely convergent for very large L. In the complementary region between the blue region and the line $T=T_{\text {kin}}$ , we show that some (arbitrarily highdegree) terms of this expansion do not converge to $0$ as their degree goes to $\infty $ , which means that the diagrammatic expansion cannot converge absolutely in this region. Therefore, the only way for the kinetic conjecture to be true in the scaling regimes not included in Theorem 1.1 is for those terms to exhibit a highly nontrivial cancellation, which would make the series converge conditionally but not absolutely.
Finally, we remark on the restriction in formula (1.4). The upper bounds on T on the left are necessary from numbertheoretic considerations: indeed, if $T\gg L^2$ for a rational torus, or if $T\gg L^d$ for an irrational one, the exact resonances of the NLS equation dominate the quasiresonant interactions that lead to the kinetic wave equation. One should therefore not expect the kinetic description to hold in those ranges of T (see Lemma 3.2 and Section 4). The second set of restrictions in formula (1.4) correspond exactly to the requirement that the size of the Feynman diagrams of degree n can be bounded by $\rho ^n$ with some $\rho \ll 1$ . In fact, if one aims only at proving existence with high probability (not caring about the asymptotics of $\mathbb {E}\left \lvert \widehat {u}(t,k)\right \rvert ^2$ ), then the restrictions on the left of formula (1.4) will not be necessary, and one obtains control for longer times. See also the following remark:
Remark 1.4 Admissible scaling laws
The foregoing restrictions on T impose the limits of the admissible scaling laws, in which $\alpha \to 0$ and $L \to \infty $ , for which the kinetic description of the longtime dynamics can appear. Indeed, since $T_{\mathrm {kin}}=\alpha ^{2}$ , then the necessary (up to $L^{\delta }$ factors) restrictions $T\ll L^{2\delta }$ (resp., $T\ll L^{d\delta }$ ) on the rational (resp., irrational) torus already mentioned imply that one should only expect the previous kinetic description in the regime where $\alpha \gtrsim L^{1}$ (resp., $\gtrsim L^{d/2}$ ). In other words, the kinetic description requires the nonlinearity to be weak, but not too weak! In the complementary regime of very weak nonlinearity, the exact resonances of the equation dominate the quasiresonances – a regime referred to as discrete wave turbulence (see [Reference L’vov and Nazarenko36, Reference Kartashova32, Reference Nazarenko39]), in which different effective equations, like the (CR) equation in [Reference Faou, Germain and Hani24, Reference Buckmaster, Germain, Hani and Shatah6], can arise.
1.2 Ideas of the proof
As Theorem 1.1 is a consequence of Theorem 1.3, we will focus on Theorem 1.3. The proof of Theorem 1.3 contains three components: ( $1$ ) a longtime wellposedness result, where we expand the solution to (NLS) into Feynman diagrams for sufficiently long time, up to a wellcontrolled error term; ( $2$ ) computation of $\mathbb E\left \lvert \widehat u_k(t)\right \rvert ^2 \left (k \in \mathbb Z^d_L\right )$ using this expansion, where we identify the leading terms and control the remainders; and ( $3$ ) a numbertheoretic result that justifies the large box approximation, where we pass from the sums appearing in the expansion in the previous component to the integral appearing on the righthand side of (WKE).
The main novelty of this work is in the first component, which is the hardest. The second component follows similar lines to those in [Reference Buckmaster, Germain, Hani and Shatah7]. Regarding the third component, the main novelty of this work is to complement the numbertheoretic results in [Reference Buckmaster, Germain, Hani and Shatah7] (which dealt only with the generically irrational torus) by the cases of general tori (in the admissible range of time $T\ll L^2$ ). This provides an essentially full (up to $L^{\varepsilon }$ losses) understanding of the numbertheoretic issues arising in waveturbulence derivations for (NLS). Therefore, we will limit this introductory discussion to the first component.
1.2.1 The scheme and probabilistic criticality
Though technically involved, the basic idea of the longtime wellposedness argument is in fact quite simple. Starting from (NLS) with initial data of equation (1.2), we write the solution as
where $u^{(0)}=e^{it\Delta _{\beta }}u_{\mathrm {in}}$ is the linear evolution, $u^{(n)}$ are iterated selfinteractions of the linear solution $u^{(0)}$ that appear in a formal expansion of u and $\mathcal R_{N+1}$ is a sufficiently regular remainder term.
Since $u^{(0)}$ is a linear combination of independent random variables, and each $u^{(n)}$ is a multilinear combination, each of them will behave strictly better (both linearly and nonlinearly) than its deterministic analogue (i.e., with all $\eta _k=1$ ). This is due to the wellknown large deviation estimates, which yield a ‘square root’ gain coming from randomness, akin to the central limit theorem (for instance, $\left \lVert u_{\mathrm {in}}\right \rVert _{L^{\infty }}$ is bounded by $L^{d/2}\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ in the probabilistic setting, as opposed to $1\cdot \left \lVert u_{\mathrm {in}}\right \rVert _{L^2}$ deterministically by Sobolev embedding, assuming compact Fourier support). This gain leads to a new notion of criticality for the problem, which can be definedFootnote ^{5} as the edge of the regime of $(\alpha , T)$ for which the iterate $u^{(1)}$ is better bounded than the iterate $u^{(0)}$ . It is not hard to see that $u^{(1)}$ can have size up to $O(\alpha\sqrt{T})$ (in appropriate norms), compared to the $O(1)$ size of $u^{(0)}$ (see, e.g., formula (2.25) for $n=1$ ). This justifies the notion that $T\sim T_{\mathrm {kin}}=\alpha ^{2}$ corresponds to probabilistically critical scaling, whereas the time scales $T\ll T_{\mathrm {kin}}$ are subcritical.Footnote ^{6}
As it happens, a certain notion of criticality might not capture all the subtleties of the problem. As we shall see, some higherorder iterates $u^{(n)}$ will not be better bounded than $u^{(n1)}$ in the full subcritical range $T\ll \alpha ^{2}$ we have postulated, but instead only in a subregion thereof. This is what defines our admissible blue region in Figure 1.
We should mention that the idea of using the gain from randomness goes back to Bourgain [Reference Bourgain3] (in the randomdata setting) and to Da Prato and Debussche [Reference Da Prato and Debussche14] (later, in the stochastic PDE setting). They first noticed that the ansatz $u=u^{(0)}+\mathcal R$ allows one to put the remainder $\mathcal R$ in a higher regularity space than the linear term $u^{(0)}$ . This idea has since been applied to many different situations (see, e.g., [Reference Bourgain and Bulut5, Reference Burq and Tzvetkov8, Reference Colliander and Oh11, Reference Deng15, Reference Dodson, Lührmann and Mendelson21, Reference Kenig and Mendelson33, Reference Nahmod and Staffilani38]), though most of these works either involve only the firstorder expansion (i.e., $N=0$ ) or involve higherorder expansions with only suboptimal bounds (e.g., [Reference Bényi, Oh and Pocovnicu2]). To the best of our knowledge, the present paper is the first work where the sharp bounds for these $u^{(j)}$ terms are obtained to arbitrarily high order (at least in the dispersive setting).
Remark 1.5. There are two main reasons why the highorder expansion (1.5) gives the sharp time of control, in contrast to previous works. The first is that we are able to obtain sharp estimates for the terms $u^{(j)}$ with arbitrarily high order, which were not known previously due to the combinatorial complexity associated with trees (see Section 1.2.2).
The second reason is more intrinsic. In higherorder versions of the original Bourgain–Da Prato–Debussche approach, it usually stops improving in regularity beyond a certain point, due to the presence of the highlow interactions (heuristically, the gain of powers of low frequency does not transform to the gain in regularity). This is a major difficulty in randomdata theory, and in recent years a few methods have been developed to address it, including regularity structure [Reference Hairer29], paracontrolled calculus [Reference Gubinelli, Imkeller and Perkowski28] and random averaging operators [Reference Deng, Nahmod and Yue18]. Fortunately, in the current problem this issue is absent, since the wellprepared initial data (1.2) bound the highfrequency components (where $\lvert k\rvert \sim 1$ ) and lowfrequency components (where $\left \lvert k\right \rvert \sim L^{1}$ ) uniformly, so the highlow interaction is simply controlled in the same way as the highhigh interaction, allowing one to gain regularity indefinitely as the order increases.
1.2.2 Sharp estimates of Feynman trees
We start with the estimate for $u^{(n)}$ . As is standard with the cubic nonlinear Schrödinger equation, we first perform the Wick ordering by defining
Note that $M_0$ is essentially the mass which is conserved. Now w satisfies the renormalised equation
and $\left \lvert \widehat {w_k}(t)\right \rvert ^2=\left \lvert \widehat {u_k}(t)\right \rvert ^2$ . This gets rid of the worst resonant term, which would otherwise lead to a suboptimal time scale.
Let $w^{(n)}$ be the nthorder iteration of the nonlinearity in equation (1.6), corresponding to the $u^{(n)}$ in equation (1.5). Since this nonlinearity is cubic, by induction it is easy to see that $w^{(n)}$ can be written (say, in Fourier space) as a linear combination of termsFootnote ^{7} $\mathcal J_{\mathcal{T}\,}$ , where $\mathcal{T}\,$ runs over all ternary trees with exactly n branching nodes (we will say it has scale $\mathfrak s(\mathcal{T}\,\,)=n$ ). After some further reductions, the estimate for $\mathcal J_{\mathcal{T}\,}$ can be reduced to the estimate for terms of the form
where $\eta _k(\omega )$ is as in equation (1.2), $(k_1,\ldots ,k_{2n+1})\in \left (\mathbb {Z}_L^d\right )^{2n+1}$ , S is a suitable finite subset of $\left (\mathbb {Z}_L^d\right )^{2n+1}$ and the $(2n+1)$ subscripts correspond to the $(2n+1)$ leaves of $\mathcal{T}\,$ (see Definition 2.2 and Figure 2).Footnote ^{8}
To estimate $\Sigma _k$ defined in formaul (1.7) we invoke the standard large deviation estimate (see Lemma 3.1), which essentially asserts that $\left \lvert \Sigma _k\right \rvert \lesssim (\#S)^{1/2}$ with overwhelming probability, provided that there is no pairing in $(k_1,\ldots ,k_{2n+1})$ , where a pairing $\left (k_i,k_j\right )$ means $k_i=k_j$ and the signs of $\eta _{k_i}$ and $\eta _{k_j}$ in formula (1.7) are opposites. Moreover, in the case of a pairing $\left (k_i,k_j\right )$ we can essentially replace $\eta _{k_i}^{\pm } \eta _{k_j}^{\pm }=\left \lvert \eta _{k_i}\right \rvert ^2\approx 1$ , so in general we can bound, with overwhelming probability,
It thus suffices to bound the number of choices for $(k_1,\ldots ,k_{2n+1})$ given the pairings, as well as the number of choices for the paired $k_j$ s given the unpaired $k_j$ s.
In the nopairing case, such counting bounds are easy to prove, since the set S is well adapted to the tree structure of $\mathcal{T}\,$ ; what makes the counting nontrivial is the pairings, especially those between leaves that are far away or from different levels (see Figure 3, where a pairing is depicted by an extra link between the two leaves). Nevertheless, we have developed a counting algorithm that specifically deals with the given pairing structure of $\mathcal{T}\,$ and ultimately leads to sharp counting bounds and consequently sharp bounds for $\Sigma _k$ (see Proposition 3.5).
1.2.3 An $\ell ^2$ operator norm bound
In contrast to the tree terms $\mathcal J_{\mathcal{T}\,}$ , the remainder term $\mathcal R_{N+1}$ has no explicit random structure. Indeed, the only way it feels the ‘chaos’ of the initial data is through the equation it satisfies, which in integral form and spatial Fourier variables looks like
where $\mathcal J_{\sim N}$ is a sum of Feynman trees $\mathcal J_{\mathcal{T}\,}$ (already described) of scale $\mathfrak s (\mathcal{T}\,\,)\sim N$ , and $\mathcal L$ , $\mathcal Q$ and $\mathcal C$ are, respectively, linear, bilinear and trilinear operator in $\mathcal R_{N+1}$ . The main point here is that one would like to propagate the estimates on $\mathcal J_{\sim N}$ to $\mathcal R_{N+1}$ itself; this is how we make rigorous the socalled ‘propagation of chaos or quasiGaussianity’ claims that are often adopted in formal derivations. In another aspect, qualitative results on propagation of quasiGaussianity, in the form of absolute continuity of measures, have been obtained in some cases (with different settings) by exploiting almostconservation laws (e.g., [Reference Tzvetkov48]).
Since we are bootstrapping a smallness estimate on $\mathcal R_{N+1}$ , any quadratic and cubic form of $\mathcal R_{N+1}$ will be easily bounded. It therefore suffices to propagate the bound for the term $\mathcal L(\mathcal R_{N+1})$ , which reduces to bounding the $\ell ^2\to \ell ^2$ operator norm for the linear operator $\mathcal L$ . By definition, the operator $\mathcal L$ will have the form $v\mapsto \mathcal {IW}\left (\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}, v\right )$ , where $\mathcal {I}$ is the Duhamel operator, $\mathcal {W}$ is the trilinear form coming from the cubic nonlinearity and $\mathcal J_{\mathcal{T}\,_1}, \mathcal J_{\mathcal{T}\,_2}$ are trees of scale $\leq N$ ; thus in Fourier space it can be viewed as a matrix with random coefficients. The key to obtaining the sharp estimate for $\mathcal L$ is then to exploit the cancellation coming from this randomness, and the most efficient way to do this is via the $TT^*$ method.
In fact, the idea of applying the $TT^*$ method to random matrices has already been used by Bourgain [Reference Bourgain3]. In that paper one is still far above (probabilistic) criticality, so applying the $TT^*$ method once already gives adequate control. In the present case, however, we are aiming at obtaining sharp estimates, so applying $TT^*$ once will not be sufficient.
The solution is thus to apply $TT^*$ sufficiently many times (say, $D\gg 1$ ), which leads to the analysis of the kernel of the operator $(\mathcal L\mathcal L^*)^D$ . At first sight this kernel seems to be a complicated multilinear expression which is difficult to handle; nevertheless, we make one key observation, namely that this kernel can essentially be recast in the form of formula (1.7) for some large auxiliary tree $\mathcal{T}\,=\mathcal{T}\;^D$ , which is obtained from a single root node by attaching copies of the trees $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ successively a total of $2D$ times (see Figure 4). With this observation, the arguments in the previous section then lead to sharp bounds of the kernel of $(\mathcal L\mathcal L^*)^D$ , up to some loss that is a power of L independent of D; taking the $1/(2D)$ power and choosing D sufficiently large makes this power negligible and implies the sharp bound for the operator norm of $\mathcal L$ (see Section 3.3).
1.2.4 Sharpness of estimates
We remark that the estimates we prove for $\mathcal J_{\mathcal{T}\,}$ are sharp up to some finite power of L (independent of $\mathcal{T}\,$ ). More precisely, from Proposition 2.5 we know that for any ternary tree $\mathcal{T}\,$ of scale n and possible pairing structure (see Definition 3.3), with overwhelming probability,
where $\rho $ is some quantity depending on $\alpha $ , L and T (see formula (2.24)), k is the spatial Fourier variable and $h^b$ is a timeSobolev norm defined in equation (2.22); on the other hand, we will show that that for some particular choice of trees $\mathcal{T}\,$ of scale n and some particular choice of pairings, with high probability,
The timescale T of Theorem 1.3 is the largest that makes $\rho \ll 1$ ; thus if one wants to go beyond T in cases other than Theorem 1.1, it would be necessary to address the divergence of formula (1.9) with $\rho \gg 1$ by exploiting the cancellation between different tree terms or different pairing choices (see Section 3.4).
1.2.5 Discussions
Shortly after the completion of this paper, work of Collot and Germain [Reference Collot and Germain12] was announced that studies the same problem, but only in the rationaltorus setting. In the language of this paper, their result corresponds to the validity of equation (1.3) for $L\leq t\leq L^{2\delta }$ , under the assumption $\alpha \leq L^{1\delta }$ . This is a special case of Theorem 1.3, essentially corresponding to the rectangle below the horizontal line $\log _LT=2$ and to the right of the vertical line $\log _L\left (\alpha ^{1}\right )=1$ in Figure 1. We also mention later work by the same authors [Reference Collot and Germain13], where they consider a generic nonrectangular torus (as opposed to the rectangular tori here and in [Reference Collot and Germain12]) and prove the existence of solutions (but without justifying equation (1.3)) up to time $t\leq L^{\delta }T_{\mathrm {kin}}$ for a wider range of power laws between $\alpha $ and L.
While the present paper was being peerreviewed, we submitted new work to arXiv [Reference Deng and Hani16], in which we provide the first full derivation of (WKE) from (NLS). Those results reach the kinetic time scale $t=\tau \cdot T_{\mathrm {kin}}$ , where $\tau $ is independent of L (compared to Theorem 1.1 here, where $\tau \leq L^{\varepsilon }$ ), for the scaling law $\alpha \sim L^{1}$ on generic (irrational) rectangular tori and the scaling laws $\alpha \sim L^{\gamma }$ (where $\gamma <1$ and is close to $1$ ) on arbitrary rectangular tori.
Shortly after completing [Reference Deng and Hani16], we received a preprint of a forthcoming deep work by Staffilani and Tran [Reference Staffilani and Tran47]. It concerns a highdimensional (on $\mathbb {T}^d$ for $d\geq 14$ ) KdV equation under a timedependent Stratonovich stochastic forcing, which effectively randomises the phases without injecting energy into the system. The authors derive the corresponding wave kinetic equation up to the kinetic time scale, for the scaling law $\alpha \sim L^{0}$ (i.e., first taking $L\to \infty $ and then taking $\alpha \to 0$ ). They also prove a conditional result without such forcing, where the condition is verified for some particular initial densities converging to the equilibrium state (stationary solution to the wave kinetic equation) in the limit.
1.3 Organisation of the paper
In Section 2 we explain the diagrammatic expansion of the solution into Feynman trees, and state the a priori estimates on such trees and remainder terms, which yield the longtime existence of such expansions. Section 3 is devoted to the proof of those a priori estimates. In Section 4 we prove the main theorems already mentioned, and in Section 5 we prove the necessary numbertheoretic results that allow us to replace the highly oscillating Riemann sums by integrals.
1.4 Notation
Most notation will be standard. Let $z^+=z$ and $z^=\overline {z}$ . Define $\left \lvert k\right \rvert _{\beta }$ by $\left \lvert k\right \rvert _{\beta }^2=\beta _1k_1^2+\cdots +\beta _dk_d^2$ for $k=(k_1,\ldots ,k_d)$ . The spatial Fourier series of a function $u: {\mathbb T}_L^d \to \mathbb C$ is defined on $\mathbb Z^d_L:=L^{1}\mathbb Z^{d}$ by
The temporal Fourier transform is defined by
Let $\delta>0$ be fixed throughout the paper. Let N, s and $b>\frac {1}{2}$ be fixed, such that N and s are large enough and $b\frac {1}{2}$ is small enough, depending on d and $\delta $ . The quantity C will denote any large absolute constant, not dependent on $\big(N,s,b\frac {1}{2}\big)$ , and $\theta $ will denote any small positive constant, which is dependent on $\big(N,s,b\frac {1}{2}\big)$ ; these may change from line to line. The symbols $O(\cdot )$ , $\lesssim $ and so on will have their usual meanings, with implicit constants depending on $\theta $ . Let L be large enough depending on all these implicit constants. If some statement S involving $\omega $ is true with probability $\geq 1Ke^{L^{\theta }}$ for some constant K (depending on $\theta $ ), then we say this statement S is Lcertain.
When a function depends on many variables, we may use notations like
to denote a function f of variables $(x_i:i\in A)$ and $y_1,\ldots ,y_m$ .
2 Tree expansions and longtime existence
2.1 First reductions
Let $\widehat {u}_k(t)$ be the Fourier coefficients of $u(t)$ , as in equation (1.10). Then with $c_k(t):= e^{2\pi i\left \lvert k\right \rvert _{\beta }^2t} \widehat u_k(t)=\left (\mathcal F_{{\mathbb T}^d_L} e^{it\Delta _{\beta }} u\right )(k)$ , we arrive at the following equation for the Fourier modes:
where $ \Omega (k_1,k_2,k_3,k) =\left \lvert k_1\right \rvert _{\beta }^2\left \lvert k_2\right \rvert _{\beta }^2+\left \lvert k_3\right \rvert _{\beta }^2\left \lvert k\right \rvert _{\beta }^2. $ Note that the sum can be written as
which, defining $M=\sum _{k_3} \left \lvert c_{k_3}\right \rvert ^2$ (which is conserved), allows us to write
Here and later, $\sum ^{\times }$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1k_2+k_3=k$ and $k\not \in \{k_1,k_3\}$ . Introducing $b_k(t)=c_k(t)e^{2i\left (L^{d}\lambda \right )^{2}Mt}$ , we arrive at the following equation for $b_k(t)$ :
In Theorem 1.3 we will be studying the solution $u(t)$ , or equivalently the sequence $(b_k(t))_{k \in \mathbb Z^d_L}$ , on a time interval $[0,T]$ . It will be convenient, to simplify some notation later, to work on the unit time interval $[0,1]$ . For this we introduce the final ansatz
which satisfies the equation
Here we have also used the relation $\alpha =\lambda ^2L^{d}$ . Recall the wellprepared initial data (1.2), which transform into the initial data for $a_k$ :
where $\eta _{k}(\omega )$ are the same as in equation (1.2).
2.2 The tree expansion
Let $\boldsymbol a(t) =(a_k(t))_{k \in \mathbb Z^d_L}$ and $\boldsymbol {a}_{\mathrm {in}} =\boldsymbol a(0)$ . Let $J=[0,1]$ ; we will fix a smooth compactly supported cutoff function $\chi $ such that $\chi \equiv 1$ on J. Then by equation (2.3), we know that for $t\in J$ we have
where the Duhamel term is defined by
Since we will only be studying $\boldsymbol {a}$ for $t\in J$ , from now on we will replace $\boldsymbol {a}$ by the solution to equation (2.5) for $t\in \mathbb {R}$ (the existence and uniqueness of the latter will be clear from a proof to follow). We will be analysing the temporal Fourier transform of this (extended) $\boldsymbol {a}$ , so let us first record a formula for $\mathcal {I}$ on the Fourier side:
Lemma 2.1. Let $\mathcal {I}$ be defined as in equation (2.6), and recall that $\widetilde {G}$ means the temporal Fourier transform of G; then we have
Proof. See [Reference Deng, Nahmod and Yue17].
Now define $\mathcal J_n$ recursively by
and define
By plugging in equation (2.5), we get that $\mathcal R_{N+1}$ satisfies the equation
where the relevant terms are defined as
Next we will derive a formula for the time Fourier transform of $\mathcal J_n$ ; for this we need some preparation regarding multilinear forms associated with ternary trees.
Definition 2.2.

1. Let $\mathcal{T}~$ be a ternary tree. We use $\mathcal {L}$ to denote the set of leaves and l their number, $\mathcal {N}=\mathcal{T}\,\backslash \mathcal L$ the set of branching nodes and n their number, and $\mathfrak {r} \in \mathcal N$ the root node. The scale of a ternary tree $\mathcal{T}\,$ is defined as $\mathfrak s(\mathcal{T}\,\,)=n$ (the number of branching nodes).Footnote ^{9} A tree of scale n has $l=2n+1$ leaves and a total of $3n+1$ vertices.

2. (Signs on a tree) For each node $\mathfrak {n}\in \mathcal {N}$ , let its children from left to right be $\mathfrak {n}_1$ , $\mathfrak {n}_2$ , $\mathfrak {n}_3$ . We fix the sign $\iota _{\mathfrak {n}}\in \{\pm \}$ as follows: first $\iota _{\mathfrak {r}}=+$ , then for any node $\mathfrak {n}\in \mathcal {N}$ , define $\iota _{\mathfrak {n}_1}=\iota _{\mathfrak {n}_3}=\iota _{\mathfrak {n}}$ and $\iota _{\mathfrak {n}_2}=\iota _{\mathfrak {n}}$ .

3. (Admissible assignments) Suppose we assign to each $\mathfrak {n}\in \mathcal{T}\,$ an element $k_{\mathfrak {n}}\in \mathbb {Z}_L^d$ . We say such an assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is admissible if for any $\mathfrak {n}\in \mathcal {N}$ we have $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}k_{\mathfrak {n}_2}+k_{\mathfrak {n}_3}$ and either $k_{\mathfrak {n}}\not \in \left \{k_{\mathfrak {n}_1},k_{\mathfrak {n}_3}\right \}$ or $k_{\mathfrak {n}}=k_{\mathfrak {n}_1}=k_{\mathfrak {n}_2}=k_{\mathfrak {n}_3}$ . Clearly an admissible assignment is completely determined by the values of $k_{\mathfrak {l}}$ for $\mathfrak {l}\in \mathcal {L}$ . For any assignment, we denote $\Omega _{\mathfrak {n}}:=\Omega \left (k_{\mathfrak {n}_1},k_{\mathfrak {n}_2},k_{\mathfrak {n}_3},k_{\mathfrak {n}}\right )$ . Suppose we also fixFootnote ^{10} $d_{\mathfrak {n}}\in \{0,1\}$ for each $\mathfrak {n}\in \mathcal {N}$ ; then we can define $q_{\mathfrak {n}}$ for each $\mathfrak {n}\in \mathcal{T}\,$ inductively by
(2.16) $$ \begin{align} q_{\mathfrak{n}}=0\text{ if }\mathfrak{n}\in\mathcal L\quad\text{or}\quad q_{\mathfrak{n}}=d_{\mathfrak{n}_1}q_{\mathfrak{n}_1}d_{\mathfrak{n}_2}q_{\mathfrak{n}_2}+d_{\mathfrak{n}_3}q_{\mathfrak{n}_3}+\Omega_{\mathfrak{n}}\text{ if }\mathfrak{n}\in\mathcal{N}.\end{align} $$
Proposition 2.3. For each ternary tree $\mathcal{T}\,$ , define $\mathcal J_{\mathcal{T}\,}$ inductively by
where $\bullet $ represents the tree with a single node and $\mathcal{T}\,_1$ , $\mathcal{T}\,_2$ , $\mathcal{T}\,_3$ are the subtrees rooted at the three children of the root node of $\mathcal{T}\,$ . Then we have
Moreover, for any $\mathcal{T}\,$ of scale $\mathfrak s(\mathcal{T}\,\,)=n$ we have the formula
where the sum is taken over all admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ such that $k_{\mathfrak {r}}=k$ , and the function $\mathcal {K}=\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ satisfies
where $q_{\mathfrak {n}}$ is defined in equation (2.16).
Proof. First, equation (2.18) follows from the definitions in equations (2.9) and (2.17) and an easy induction. We now prove formulas (2.19) and (2.20) inductively, noting also that $(a_k)_{\mathrm {in}}=\sqrt {n_{\mathrm {in}}(k)}\cdot \eta _k(\omega )$ . For $\mathcal{T}\,=\bullet $ , equation (2.19) follows from equation (2.17) with $\mathcal {K}_{\mathcal{T}\,}(\tau ,k_{\mathfrak {r}})=\widetilde {\chi }(\tau )$ that satisfies formula (2.20). Now suppose formulas (2.19) and (2.20) are true for smaller trees; then by formulas (2.7) and (2.17) and Lemma 2.1, up to unimportant coefficients, we can write
where $\sum ^*$ represents summation under the conditions $k_j\in \mathbb {Z}_L^d$ , $k_1k_2+k_3=k$ and either $k\not \in \{k_1,k_3\}$ or $k=k_1=k_2=k_3$ , the signs $(\iota _1,\iota _2,\iota _3)=(+,,+)$ , and $\sigma =\tau _1\tau _2+\tau _3+T\Omega (k_1,k_2,k_3,k)$ . Now applying the induction hypothesis, we can write $\left (\widetilde {\mathcal J_{\mathcal{T}\,}}\right )_{k}(\tau )$ in the form of equation (2.19) with the function
where $\mathfrak {r}$ is the root of $\mathcal{T}\,$ with children $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ and $\mathcal{T}\,_j$ is the subtree rooted at $\mathfrak {r}_j$ .
It then suffices to prove that $\mathcal {K}_{\mathcal{T}\,}$ defined by equation (2.21) satisfies formula (2.20). By the induction hypothesis, we may fix a choice $d_{\mathfrak {n}}$ for each nonleaf node $\mathfrak {n}$ of each $\mathcal{T}\,_j$ , and let $d_{\mathfrak {r}}=d$ . Then plugging formula (2.20) into equation (2.21), we get
which upon integration in $\tau _j$ gives equation (2.20). This completes the proof.
2.3 Statement of main estimates
Define the $h^b$ space by
and similarly the $h^{s,b}$ space for $\boldsymbol a(t)=(a_k(t))_{k \in \mathbb Z^d_L}$ by
We shall estimate the solution u in an appropriately rescaled $X^{s, b}$ space, which is equivalent to estimating the sequence $\boldsymbol a(t)=\left (a_k(t)\right )_{k \in \mathbb Z^d_L}$ in the space $h^{s, b}$ . Define the quantity
By the definition of $\delta>0$ in formula (1.4), we can verify that $\alpha T^{1/2}\leq \rho \leq L^{\delta }$ .
Proposition 2.4 Wellposedness bounds
Let $\rho $ be defined as in formula (2.24); then Lcertainly, for all $1\leq n\leq 3N$ , we have
Proposition 2.4 follows from the following two bounds, which will be proved in Section 3:
Proposition 2.5 Bounds of tree terms
We have, Lcertainly, that
for any ternary tree of depth n, where $0\leq n\leq 3N$ .
Proposition 2.6 An operator norm bound
We have, Lcertainly, that for any trees $\mathcal{T}\,_1,\mathcal{T}\,_2$ with $\left \lvert \mathcal{T}\,_j\right \rvert =3n_j+1$ and $0\leq n_1,n_2\leq N$ , the operators
satisfy the bounds
Remark 2.7. The bound (2.29) is a result of the probabilistic subcriticality of the problem. Similar bounds are also used in recent work by the first author, Nahmod and Yue [Reference Deng, Nahmod and Yue19] to get sharp probabilistic local wellposedness of nonlinear Schrödinger equations. The proof in both cases relies on highorder $TT^*$ arguments, although in [Reference Deng, Nahmod and Yue19] one needs to use the more sophisticated tensor norms due to the different ansatz caused by the inhomogeneity of initial data.
Proof of Proposition 2.4 (assuming Propositions 2.5 and 2.6)
Assume we have already excluded an exceptional set of probability $\lesssim e^{L^{\theta }}$ . The bound (2.25) follows directly from formulas (2.18) and (2.27); it remains to bound $\mathcal {R}_{N+1}$ . Recall that $\mathcal {R}_{N+1}$ satisfies equation (2.11), so it suffices to prove that the mapping
is a contraction mapping from the set $\mathcal {Z}=\left \{v:\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^{N}\right \}$ to itself. We will prove only that it maps $\mathcal {Z}$ into $\mathcal {Z}$ , as the contraction part follows in a similar way. Now suppose $\left \lVert v\right \rVert _{h^{s,b}}\leq \rho ^N$ ; then by formulas (2.18) and (2.27), we have
so $\left \lVert \mathcal J_{\sim N}\right \rVert _{h^{s,b}}\ll \rho ^N$ . Next we may use formula (2.29) to bound
As for the terms $\mathcal Q(v)$ and $\mathcal C(v)$ , we apply the simple bound
(which easily follows from formula (2.7)), where $\sum _{\mathrm {cyc}}$ means summing in permutations of $(u,v,w)$ . As $\alpha T\leq L^{d}$ , we conclude (also using Proposition 2.5) that
since $\rho \leq L^{\delta }$ and $N\gg \delta ^{1}$ . This completes the proof.
3 Proof of main estimates
In this section we prove Propositions 2.5 and 2.6.
3.1 Large deviation and basic counting estimates
We start by making some preparations, namely the large deviation and counting estimates that will be used repeatedly in the proof later.
Lemma 3.1. Let $\{\eta _k(\omega )\}$ be independent, identically distributed complex random variables, such that the law of each $\eta _k$ is either Gaussian with mean $0$ and variance $1$ or the uniform distribution on the unit circle. Let $F=F(\omega )$ be defined by
where $a_{k_1\cdots k_n}$ are constants; then F can be divided into finitely many terms, and for each term there is a choice of $X=\left \{i_1,\ldots ,i_p\right \}$ and $Y=\left \{j_1,\ldots ,j_p\right \}$ , which are two disjoint subsets of $\{1,2,\ldots ,n\}$ , such that
holds with
where a pairing $\left (k_{i},k_{j}\right )$ means $\left (\iota _i+\iota _j,\iota _ik_i+\iota _jk_j\right )=0$ .
Proof. First assume $\eta _k$ is Gaussian. Then by the standard hypercontractivity estimate for an Ornstein–Uhlenbeck semigroup (see, e.g., [Reference Oh and Thomann40]), we know that formula (3.2) holds with M replaced by $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ . Now to estimate $\mathbb {E}\left \lvert F(\omega )\right \rvert ^2$ , by dividing the sum (3.1) into finitely many terms and rearranging the subscripts, we may assume in a monomial of equation (3.1) that
and the $k_{j_s}$ are different for $1\leq s\leq r$ . Such a monomial has the form
where the factors for different s are independent. We may also assume $b_s=c_s$ for $1\leq s\leq q$ and $b_s\neq c_s$ for $q+1\leq s\leq r$ , and for $1\leq j\leq j_q$ we may assume $\iota _j$ has the same sign as $(1)^j$ . Then we can further rewrite this monomial as a linear combination of
for $1\leq p\leq q$ . Therefore, $F(\omega )$ is a finite linear combination of expressions of the form
Due to independence and the fact that $\mathbb {E}\left (\left \lvert \eta \right \rvert ^{2b}b!\right )=\mathbb {E}\left (\eta ^b\left (\overline {\eta }\right )^c\right )=0$ for a normalised Gaussian $\eta $ and $b\neq c$ , we conclude that
which is bounded by the righthand side of equation (3.3), by choosing $X=\left \{1,3,\ldots ,j_p1\right \}$ and $Y=\left \{2,4,\ldots ,j_p\right \}$ , as under our assumptions $(k_{2i1},k_{2i})$ is a pairing for $2i\leq j_p$ .
Now assume $\eta _k$ is uniformly distributed on the unit circle. Let $\{g_k(\omega )\}$ be independent, identically distributed normalised Gaussians as in the first part, and consider the random variable
We can calculate
where $1\leq i\leq q$ and $1\leq j\leq n$ , and similarly for H,
The point is that we always have
In fact, in order for either side to be nonzero, for any particular k we must have
Let both be equal to m; then by independence, the factor that the $\eta _k^{\pm }$ s contribute to the expectation on the lefthand side will be $\mathbb {E}\left \lvert \eta _k\right \rvert ^{2m}=1$ , while for the righthand side it will be $\mathbb {E}\left \lvert g_k\right \rvert ^{2m}=m!\geq 1$ .
This implies that $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )\leq \mathbb {E}\left (\left \lvert H\right \rvert ^{2q}\right )$ for any positive integer q; since formula (3.2) holds for H, we have
with an absolute constant C. This gives an upper bound for $\mathbb {E}\left (\left \lvert F\right \rvert ^{2q}\right )$ , and by Chebyshev inequality, we deduce formula (3.2) for F.
Lemma 3.2. Let $\beta =(\beta _1,\ldots ,\beta _d)\in [1,2]^d$ and $0<T\leq L^d$ . Assume that $\beta $ is generic for $T\geq L^{2}$ . Then, uniformly in $(k,a,b,c)\in \left (\mathbb {Z}_L^d\right )^4$ and $m\in \mathbb {R}$ , the sets
satisfy the bounds
where in the first inequality of formula (3.10) we also assume $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ .
Moreover, with $\rho $ defined as in formula (2.24), we have
without any assumption on $(k,a,b)$ .
Proof. We first consider $S_3$ . Let $kx=p$ and $kz=q$ ; then we may write $p=\left (L^{1}u_1,\ldots , L^{1}u_d\right )$ and similarly for q, where each $u_i$ and $v_i$ is an integer and belongs to a fixed interval of length $O\left (L^{1+\theta }\right )$ . Moreover, from $(x,y,z)\in S_3$ we deduce that
We may assume $u_iv_i=0$ for $1\leq i\leq r$ , and $\sigma _i:=u_iv_i\neq 0$ for $r+1\leq i\leq d$ ; then the number of choices for $(u_i,v_i:1\leq i\leq r)$ is $O\left (L^{r+\theta }\right )$ . It is known (see [Reference Deng, Nahmod and Yue17, Reference Deng, Nahmod and Yue18]) that given $\sigma \neq 0$ , the number of integer pairs $(u,v)$ such that u and v each belongs to an interval of length $O\left (L^{1+\theta }\right )$ and $uv=\sigma $ is $O\left (L^{\theta }\right )$ . Therefore, if $\left \lvert k\right \rvert ,\left \lvert a\right \rvert ,\left \lvert b\right \rvert \leq L^{\theta }$ , then $\#S_3$ is bounded by $O\left (L^{r+\theta }\right )$ times the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ that satisfy
Using the assumption $T\leq L^{d}$ , it suffices to show that the number of choices for $(\sigma _{r+1},\ldots ,\sigma _d)$ satisfying formula (3.12) is at most $O\left (1+L^{2(dr)+\theta }T^{1}\right )$ . This latter bound is trivial if $dr=1$ or $L^2T^{1}\geq 1$ , so we may assume $dr\geq 2$ , $T\geq L^{2}$ and $\beta _i$ is generic. It is well known in Diophantine approximation (see, e.g., [Reference Cassels9]) that for generic $\beta _i$ we have
so the distance between any two points $(\sigma _i:r+1\leq i\leq d)$ and $(\sigma _i':r+1\leq i\leq d)$ satisfying formula (3.12) is at least $\left (L^2T^{1}\right )^{\frac {1}{dr1}\theta }$ . Since all these points belong to a box which has size $O(1)$ in one direction and size $O\left (L^{2+\theta }\right )$ in other orthogonal directions, we deduce that the number of solutions to formula (3.12) is at most $1+L^{\theta } L^{2(dr1)}L^2T^{1}$ , as desired.
Next, without any assumption on $(k,a,b)$ , we need to prove formula (3.11). By definition (2.24) we can check that $Q^2\geq L^{2d}\left (\min \left (T,L^2\right )\right )^{1}$ , so it suffices to prove the first inequality of formula (3.10), assuming $T\leq L^2$ . But this again follows from formula (3.12), noting that now $\left \lvert \sigma _j\right \rvert \leq L^{2+\theta }$ is no longer true, but each $\sigma _j$ still has at most $L^{2+\theta }$ possible values.
Finally we consider $S_2$ , which is much easier. In fact, formula (3.11) follows from formula (3.10), so we only need to prove the latter. Now if $T\leq L$ , we trivially have $\#S_2\leq L^{d+\theta }$ , as y will be fixed once x is; if $T\geq L$ , then we may assume $x_dy_d\neq 0$ if the sign $\pm $ is $$ , and then fix the first coordinates $x_j (1\leq j\leq d1)$ and hence $y_j (1\leq j\leq d1)$ . Then we have that $x_d\pm y_d$ is fixed, and $x_d^2\pm y_d^2$ belongs to a fixed interval of length $O\left (T^{1}\right )$ . Since $x_d,y_d\in L^{1}\mathbb {Z}$ , we know that $x_d$ has at most $1+L^2T^{1}$ choices, which implies what we want to prove.
3.2 Bounds for $\mathcal {J}_n$
In this section we prove Proposition 2.5. We will need to extend the notion of ternary trees to paired, coloured ternary trees:
Definition 3.3 Tree pairings and colourings
Let $\mathcal{T}\,$ be a ternary tree as in Definition 2.2. We will pair some of the leaves of $\mathcal{T}\,$ such that each leaf belongs to at most one pair. The two leaves in a pair are called partners of each other, and the unpaired leaves are called single. We assume $\iota _{\mathfrak {l}}+\iota _{\mathfrak {l}'}=0$ for any pair $(\mathfrak {l},\mathfrak {l}')$ . The set of single leaves is denoted $\mathcal {S}$ . The number of pairs is denoted by p, so that $\lvert {\mathcal S}\rvert =l2p$ . Moreover, we assume that some nodes in $\mathcal {S}\cup \{\mathfrak {r}\}$ are coloured red, and let $\mathcal R$ be the set of red nodes. We shall denote $r=\lvert \mathcal R\rvert $ .
We shall use red colouring to denote that the frequency assignments to the corresponding red vertex are fixed in the counting process. We also introduce the following definition:
Definition 3.4 Strong admissibility
Suppose we fix $n_{\mathfrak {m}}\in \mathbb {Z}_L^d$ for each $\mathfrak {m}\in \mathcal R$ . An assignment $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ is called strongly admissible with respect to the given pairing, colouring and $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ if it is admissible in the sense of Definition 2.2, and
The key to the proof of Proposition 2.5 is the following combinatorial counting bound:
Proposition 3.5. Let $\mathcal{T}\,$ be a paired and coloured ternary tree such that $\mathcal R\neq \varnothing $ , and let $(n_{\mathfrak {m}}:\mathfrak {m}\in \mathcal R)$ be fixed. We also fix $\sigma _{\mathfrak {n}}\in \mathbb {R}$ for each $\mathfrak {n}\in \mathcal {N}$ . Let $l=\lvert \mathcal L\rvert $ be the total number of leaves, p be the number of pairs and $r=\lvert \mathcal R\rvert $ be the number of red nodes. Then the number of strongly admissible assignments $(k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,\,)$ which also satisfy
is – recalling Q defined in formula (3.11) – bounded by
Proof. We proceed by induction. The base cases directly follow from formula (3.11). Now suppose the desired bound holds for all smaller trees, and consider $\mathcal{T}\,$ . Let $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ be the children of the root node $\mathfrak {r}$ and $\mathcal{T}\,_j$ be the subtree rooted at $\mathfrak {r}_j$ . Let $l_j$ be the number of leaves in $\mathcal{T}\,_j$ , $p_j$ the number of pairs within $\mathcal{T}\,_j$ and $p_{ij}$ the number of pairings between $\mathcal{T}\,_i$ and $\mathcal{T}\,_j$ , and let $r_j=\left \lvert \mathcal {R}\cap \mathcal{T}\,_j\right \rvert $ ; then we have
Also note that $\lvert k_{\mathfrak {n}}\rvert \lesssim L^{\theta }$ for all $\mathfrak {n}\in \mathcal{T}\,$ .
The proof will be completely algorithmic, with the discussion of a lot of cases. The general strategy is to perform the following four operations, which we refer to as $\mathcal {O}_j (0\leq j\leq 3)$ , in a suitable order. Here in operation $\mathcal {O}_0$ we apply formula (3.11) to count the number of choices for the values among $\left \{k_{\mathfrak {r}},k_{\mathfrak {r}_1},k_{\mathfrak {r}_2},k_{\mathfrak {r}_2}\right \}$ that are not already fixed (this step may be trivial if three of these four vectors are already fixed –i.e., coloured – or if one of them is already fixed and $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ ). In operations $\mathcal {O}_j (1\leq j\leq 3)$ , we apply the induction hypothesis to one of the subtrees $\mathcal{T}\,_j$ and count the number of choices for $\left (k_{\mathfrak {n}}:\mathfrak {n}\in \mathcal{T}\,_j\right )$ . Let the number of choices associated with $\mathcal {O}_j (0\leq j\leq 3)$ be $M_j$ , with superscripts indicating different cases. In the whole process we may colour new nodes $\mathfrak {n}$ red if $k_{\mathfrak {n}}$ is already fixed during the previous operations, namely when $\mathfrak {n}=\mathfrak {r}$ and we have performed $\mathcal {O}_0$ before, when $\mathfrak {n}=\mathfrak {r}_j$ and we have performed $\mathcal {O}_0$ or $\mathcal {O}_j$ before or when $\mathfrak {n}$ is a leaf that has a partner in $\mathcal{T}\,_j$ and we have performed $\mathcal {O}_j$ before.
(1) Suppose $\mathfrak {r}\not \in \mathcal R$ ; then we may assume that there is a red leaf from $\mathcal{T}\,_1$ .Footnote ^{11} We first perform $\mathcal {O}_1$ and get a factor
Now $\mathfrak {r}_1$ is coloured red, as is any leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_1$ . There are then two cases.
(1.1) Suppose now there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is red. Then we perform $\mathcal {O}_2$ and get a factor
Now $\mathfrak {r}_2$ is coloured red, as is any leaf of $\mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are again two cases.
(1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor
then colour $\mathfrak {r}_3$ red and apply $\mathcal {O}_0$ to get a factor $M_0^{(1.1.1)}:=1$ . Thus
which is what we need.
(1.1.2) Suppose after step (1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{13}=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(1.1.2)}:=L^{\theta } Q^{1}$ (perhaps with slightly enlarged $\theta $ ; the same applies later). Now we may colour $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Thus
which is what we need.
(1.2) Now suppose that after step (1) there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=p_{12}=p_{13}=0$ . There are two cases.
(1.2.1) Suppose there is a single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ . Then we will perform $\mathcal {O}_0$ and get a factor $M_0^{(1.2.1)}:=L^{\theta } Q^{2}$ . Now we may colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Now any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ is coloured red, so we may perform $\mathcal {O}_2$ and get a factor
Thus
which is what we need.
(1.2.2) Suppose there is no single leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then all leaves in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ are paired to one another, which implies that $k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ and that $\mathfrak {r}_2$ and $\mathfrak {r}_3$ have opposite signs, and hence by the admissibility condition we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . This allows us to perform $\mathcal {O}_0$ and colour $\mathfrak {r}_2$ and $\mathfrak {r}_3$ red with $M_0^{(1.2.2)}:=1$ , then perform $\mathcal {O}_3$ and colour red any leaf of $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , then perform $\mathcal {O}_2$ (for which we use the second bound in formula (3.15)). This leads to the factors
and thus
which is better than what we need.
(2) Now suppose $\mathfrak {r}\in \mathcal R$ ; then $r=r_1+r_2+r_3+1$ . There are two cases.
(2.1) Suppose there is one single leaf that is not red, say from $\mathcal{T}\,_1$ . There are again two cases.
(2.1.1) Suppose there is a red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say $\mathcal{T}\,_2$ . Then we perform $\mathcal {O}_2$ and get a factor
We now colour red $\mathfrak {r}_2$ and any leaf in $\mathcal{T}\,_1\cup \mathcal{T}\,_3$ which has a partner in $\mathcal{T}\,_2$ . There are a further two cases.
(2.1.1.1) Suppose now there is a red leaf in $\mathcal{T}\,_3$ ; then we perform $\mathcal {O}_3$ and get a factor
Now we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.1)}:=1$ , then colour red $\mathfrak {r}_1$ as well as any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.1.2) Suppose after step (2.1.1) there is no red leaf in $\mathcal{T}\,_3$ ; then $r_3=p_{23}=0$ . We perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.1.2)}:=L^{\theta } Q^{1}$ . Then we colour $\mathfrak {r}_1$ and $\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.2) Suppose in the beginning there is no red leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ ; then $r_2=r_3=0$ . There are again two cases.
(2.1.2.1) Suppose there is a leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ , say from $\mathcal{T}\,_2$ , that is either single or paired with a leaf in $\mathcal{T}\,_1$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.1)}:=L^{\theta } Q^{2}$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
We then colour red any leaf of $\mathcal{T}\,_1$ and $\mathcal{T}\,_2$ which has a partner in $\mathcal{T}\,_3$ , and perform $\mathcal {O}_2$ to get a factor
Finally we colour red any leaf of $\mathcal{T}\,_1$ which has a partner in $\mathcal{T}\,_2$ , and perform $\mathcal {O}_1$ to get a factor
Thus
which is what we need.
(2.1.2.2) Suppose there is no leaf in $\mathcal{T}\,_2\cup \mathcal{T}\,_3$ that is either single or paired with a leaf in $\mathcal{T}\,_1$ ; then in the same way as in case (1.2.2), we must have $k_{\mathfrak {r}}=k_{\mathfrak {r}_1}=k_{\mathfrak {r}_2}=k_{\mathfrak {r}_3}$ . Moreover, we have $p_{12}=p_{13}=0$ . Then we perform $\mathcal {O}_0$ and get a factor $M_0^{(2.1.2.2)}:=1$ . After this we colour $\mathfrak {r}_1,\mathfrak {r}_2,\mathfrak {r}_3$ red and perform $\mathcal {O}_3$ to get a factor
We then colour red any leaf of