1 Introduction
Wigner random matrices are $N\times N$ random Hermitian matrices $W=W^*$ with centred, independent, identically distributed (i.i.d.) entries up to the symmetry constraint $w_{ab} = \overline {w_{ba}}$ . Originally introduced by E. Wigner [Reference Wigner53] to study spectral gaps of large atomic nuclei, Wigner matrices have become the most studied random matrix ensemble since they represent the simplest example of a fully chaotic quantum Hamiltonian beyond the explicitly computable Gaussian case.
A key conceptual feature of Wigner matrices, as well as a fundamental technical tool to study them, is the fact that their resolvent $G(z):= (Wz)^{1}$ , with a spectral parameter z away from the real axis becomes asymptotically deterministic in the large N limit. The limit is the scalar matrix $m(z)\cdot I$ , where $m(z) = \frac {1}{2}(z+\sqrt {z^24})$ is the Stieltjes transform of the Wigner semicircular density, $\rho _{\mathrm {sc}}(x) = \frac {1}{2\pi }\sqrt {4x^2}$ , which is the $N\to \infty $ limit of the empirical density of the eigenvalues of W under the standard normalisation $\operatorname {\mathbf {E}} w_{ab}^2= 1/N$ . The local law on optimal scale asserts that this limit holds even when z is very close to the real axis, as long as $\Im z\gg 1/N$ . Noticing that the imaginary part of the Stieltjes transform resolves the spectral measure on a scale comparable with $\Im z$ , this condition is necessary for a deterministic limit to hold since on scales of order $1/N$ , comparable with the typical eigenvalue spacing, the resolvent is genuinely fluctuating.
The limit $G(z)\to m(z)\cdot I$ holds in a natural appropriate topology, namely when tested against deterministic $N\times N$ matrices A: that is, in the form , where denotes the normalised trace. It is essential that the test matrix A is deterministic; no analogous limit can hold if A is random and strongly correlated with W: for example, if A is a spectral projection of W.
The first optimal local law for Wigner matrices was proven for $A=I$ in [Reference Erdős, Schlein and Yau27]; see also [Reference Cacciapuoti, Maltsev and Schlein13, Reference Götze, Naumov and Tikhomirov32, Reference Tao and Vu50, Reference Tao and Vu51], extended later to more general matrices A in the form thatFootnote ^{1}
holds with a very high probability for any fixed $\xi>0$ if N is sufficiently large. By optimality in this paper, we always mean up to a tolerance factor $N^{\xi }$ . This is a natural byproduct of our method yielding very high probability estimates under the customary moment condition; see equation (2.2) later.Footnote ^{2} The estimate given by equation (1.1) is called the average local law, and it controls the error in terms of the standard Euclidean matrix norm $\ A\$ of A. It holds for arbitrary deterministic matrices A, and it is also optimal in this generality with respect to the dependence on A: for example, for $A=I$ , the trace
is approximately complex Gaussian with standard deviation [Reference He and Knowles33]
but equation (1.1) is far from being optimal when applied to matrices with small rank. Rankone matrices, $A= \boldsymbol {y} \boldsymbol {x}^*$ , are especially important since they give the asymptotic behaviour of resolvent matrix elements $G_{\boldsymbol {x} \boldsymbol {y}}:= \langle \boldsymbol {x}, G \boldsymbol {y}\rangle $ . For such special test matrices, a separate isotropic local law of the optimal form
has been proven; see [Reference Erdős, Yau and Yin28] for special coordinate vectors and later [Reference Knowles and Yin38] for general vectors $\boldsymbol {x}, \boldsymbol {y}$ , as well as [Reference Erdős, Krüger and Schröder26, Reference He, Knowles and Rosenthal34, Reference Knowles and Yin36, Reference Lee and Schnelli40] for more general ensembles. Note that a direct application of equation (1.1) to $A= \boldsymbol {y} \boldsymbol {x}^*$ would give a bound of order $1/\eta $ instead of the optimal $1/\sqrt {N\eta }$ in equation (1.2), which is an unacceptable overestimate in the most interesting small $\eta $ regime. More generally, the average local law given by equation (1.1) performs badly when A has effectively small rank: that is, if only a few eigenvalues of A are comparable with the norm $\A\$ and most other eigenvalues are much smaller or even zero.
Quite recently, we found that the average local law given by equation (1.1) is also suboptimal for another class of test matrices A, namely traceless matrices. In [Reference Cipolloni, Erdős and Schröder15], we proved that
for any deterministic matrix A with : that is, traceless observables yield an additional $\sqrt {\eta }$ improvement in the error. The optimality of this bound for general traceless A was demonstrated by identifying the nontrivial Gaussian fluctuation of in [Reference Cipolloni, Erdős and Schröder16].
While the mechanism behind the suboptimality of equation (1.1) for small rank and traceless A is very different, their common core is that estimating the size of A simply by the Euclidean norm is too crude for several important classes of A. In this paper, we present a local law that unifies all three local laws in equations (1.1), (1.2) and (1.3) by identifying the appropriate way to measure the size of A. Our main result (Theorem 2.2, $k=1$ case) shows that
holds with very high probability, where is the traceless part of A. It is straightforward to check that equation (1.4) implies equations (1.1), (1.2) and (1.3); moreover, it optimally interpolates between fullrank and rankone matrices A; hence we call equation (1.4) the rankuniform local law for Wigner matrices. Note that an optimal local law for matrices of intermediate rank was previously unknown; indeed, the local laws given by equations (1.1) and (1.2) are optimal only for essentially fullrank and essentially finiterank observables, respectively. The proof of the optimality of equation (1.4) follows from identifying the scale of the Gaussian fluctuation of its lefthand side. Its standard deviation for traceless A is
this relation was established for matrices with bounded norm $\ A\\lesssim 1$ in [Reference Cipolloni, Erdős and Schröder16, Reference Lytova42].
The key observation that traceless A substantially improves the error term in equation (1.3) compared with equation (1.1) was the conceptually new input behind our recent proof of the Eigenstate Thermalisation Hypothesis in [Reference Cipolloni, Erdős and Schröder15] followed by the proof of the normal fluctuation in the quantum unique ergodicity for Wigner matrices in [Reference Cipolloni, Erdős and Schröder17]. Both results concern the behaviour of the eigenvector overlaps: that is, quantities of the form , where $\{{\boldsymbol {u}}_i\}_{i=1}^N$ are the normalised eigenvectors of W. The former result stated that
holds with very high probability for any $i,j$ and for any fixed $\xi>0$ . The latter result established the optimality of equation (1.6) for $i=j$ by showing that $\sqrt {N} \langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ is asymptotically Gaussian when the corresponding eigenvalue lies in the bulk of the spectrum. The variance of $\sqrt {N} \langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ was shown to be in [Reference Cipolloni, Erdős and Schröder17], but we needed to assume that with some fixed positive constant c: that is, that the rank of $\mathring {A}$ was essentially macroscopic.
As the second main result of the current paper, we now remove this unnatural condition and show the standard Gaussianity of the normalised overlaps for bulk indices under the optimal and natural condition that , which essentially ensures that $\mathring {A}$ is not of finite rank. This improvement is possible thanks to improving the dependence of the error terms in the local laws from $\\mathring {A}\$ to similarly to the improvement in equation (1.4) over equation (1.3). We will also need a multiresolvent version of this improvement since offdiagonal overlaps $\langle {\boldsymbol {u}}_i, A {\boldsymbol {u}}_j \rangle $ are not accessible via singleresolvent local laws; in fact, $\langle {\boldsymbol {u}}_i, A {\boldsymbol {u}}_j \rangle ^2$ is intimately related to with two different spectral parameters $z, z'$ , analysed in Theorem 2.2. As a corollary, we will show the following improvement of equation (1.6) (see Theorem 2.6)
for the bulk indices. The analysis at the edge is deferred to later work.
Gaussian fluctuation of diagonal overlaps with a special low rank observable has been proven earlier. Right after [Reference Cipolloni, Erdős and Schröder17] was posted on the arXiv, Benigni and Lopatto in an independent work [Reference Benigni and Lopatto7] proved the standard Gaussian fluctuation of $[N/S]^{1/2}\big [\sum _{a\in S} u_i(a)^2  S/N]$ whenever $1\ll S\ll N$ : that is, they considered $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ for the special case when the matrix A is the projection on coordinates from the set S. Their result also holds at the edge. The condition $S\ll N$ requires A to have small rank; hence it is complementary to our old condition from [Reference Cipolloni, Erdős and Schröder17] for projection operators. The natural condition $S\gg 1$ is the special case of our new improved condition . In particular, our new result covers [Reference Benigni and Lopatto7] as a special case in the bulk, and it gives a uniform treatment of all observables in full generality.
The methods of [Reference Benigni and Lopatto7] and [Reference Cipolloni, Erdős and Schröder17] are very different albeit they both rely on Dyson Brownian motion (DBM), complemented by fairly standard Green function comparison (GFT) techniques. Benigni and Lopatto focused on the joint Gaussianity of the individual eigenvector entries $u_i(a)$ (or, more generally, linear functionals $\langle q_{\alpha }, {\boldsymbol {u}}_i\rangle $ with deterministic unit vectors $q_{\alpha }$ ) in the spirit of the previous quantum ergodicity results by Bourgade and Yau [Reference Bourgade and Yau10] operating with the socalled eigenvector moment flow from [Reference Bourgade and Yau10] complemented by its ‘fermionic’ version by Benigni [Reference Benigni9]. This approach becomes less effective when more entries need to be controlled simultaneously, and it seems to have a natural limitation at $S\ll N$ .
Our method viewed the eigenvector overlap $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_i \rangle $ and its offdiagonal version $\langle {\boldsymbol {u}}_i, \mathring {A} {\boldsymbol {u}}_j \rangle $ as one unit without translating it into a sum of rankone projections $\langle {\boldsymbol {u}}_i, q_{\alpha }\rangle \langle q_{\alpha }, {\boldsymbol {u}}_j\rangle $ via the spectral decomposition of $\mathring {A}$ . The corresponding flow for overlaps with arbitrary A, called the stochastic eigenstate equation, was introduced by Bourgade, Yau and Yin in [Reference Bourgade, Yau and Yin12] (even though they applied it to the special case when A is a projection, their formalism is general). The analysis of this new flow is more involved than the eigenvector moment flow since it operates on a geometrically more complicated higherdimensional space. However, the substantial part of this analysis has been done by Marcinek and Yau [Reference Marcinek43], and we heavily relied on their work in our proof [Reference Cipolloni, Erdős and Schröder17].
We close this introduction by commenting on our methods. The main novelty of the current paper is the proof of the rankuniform local laws involving the HilbertSchmidt norm instead of the Euclidean matrix norm $\\mathring {A}\$ . This is done in Section 3, and it will directly imply the improved overlap estimate in equation (1.7). Once this estimate is available, both the DBM and the GFT parts of the proof in the current paper are essentially the same as in [Reference Cipolloni, Erdős and Schröder17]; hence we will not give all details but only point out the differences. While this can be done very concisely for the GFT in Appendix B, for the DBM part, we need to recall a large part of the necessary setup in Section 4 for the convenience of the reader.
As to our main result, the general scheme to prove single resolvent local laws has been well established, and traditionally it consisted of two parts: (i) the derivation of an approximate selfconsistent equation that $Gm$ satisfies and (ii) estimating the key fluctuation term in this equation. The proofs of the multiresolvent local laws follow the same scheme, but the selfconsistent equation is considerably more complicated, and its stability is more delicate; see, for example, [Reference Cipolloni, Erdős and Schröder15, Reference Cipolloni, Erdős and Schröder19], where general multiresolvent local laws were proven. The main complication lies in part (ii), where a high moment estimate is needed for the fluctuation term. The corresponding cumulant expansion results in many terms that have typically been organised and estimated by a graphical Feynman diagrammatic scheme. A reasonably manageable power counting handles all diagrams for the purpose of proving equations (1.1) and (1.2). However, in the multiresolvent setup, or if we aim at some improvement, the diagrammatic approach becomes very involved since the right number of additional improvement factors needs to be gained from every single graph. This was the case many times before: (i) when a small factor (socalled ‘sigmacell’) was extracted at the cusp [Reference Erdős, Krüger and Schröder25], (ii) when we proved that the correlation between the resolvents of the Hermitization of an i.i.d. random matrix shifted by two different spectral parameters $z_1, z_2$ decays in $1/z_1z_2$ [Reference Cipolloni, Erdős and Schröder14] and (iii) more recently when the gain of order $\sqrt {\eta }$ due to the traceless A in equation (1.3) was obtained in [Reference Cipolloni, Erdős and Schröder15].
Extracting instead of $\A\$ , especially in the multiresolvent case, seems even more involved in this way since estimating A simply by its norm appears everywhere in any diagrammatic expansion. However, very recently in [Reference Cipolloni, Erdős and Schröder18] we introduced a new method of a system of master inequalities that circumvents the full diagrammatic expansion. The power of this method was demonstrated by fully extracting the maximal $\sqrt {\eta }$ gain from traceless A even in the multiresolvent setup; the same result seemed out of reach with the diagrammatic method used for the singleresolvent setup in [Reference Cipolloni, Erdős and Schröder15]. In the current paper, we extend this technique to obtain the optimal control in terms of instead of $\\mathring {A}\$ for single resolvent local laws. However, the master inequalities in this paper are different from the ones in [Reference Cipolloni, Erdős and Schröder18]; in fact, they are much tighter since the effect we extract now is much more delicate. We also obtain a similar optimal control for the multiresolvent local laws needed to prove the Gaussianity of the bulk eigenvector overlaps under the optimal condition on A.
Notations and conventions
We denote vectors by boldfaced lowercase Roman letters ${\boldsymbol x}, {\boldsymbol y}\in \mathbf {C} ^N$ , for some $N\in \mathbf {N}$ . Vector and matrix norms,
and
, indicate the usual Euclidean norm and the corresponding induced matrix norm. For any $N\times N$ matrix A, we use the notation
to denote the normalised trace of A. Moreover, for vectors ${\boldsymbol x}, {\boldsymbol y}\in \mathbf {C}^N$ and matrices $A\in \mathbf {C}^{N\times N}$ , we define
We will use the concept of ‘with very high probability’, meaning that for any fixed $D>0$ , the probability of an Ndependent event is bigger than $1N^{D}$ if $N\ge N_0(D)$ . We introduce the notion of stochastic domination (see, for example, [Reference Erdős, Knowles, Yau and Yin24]): given two families of nonnegative random variables
indexed by N (and possibly some parameter u in some parameter space $U^{(N)}$ ), we say that X is stochastically dominated by Y, if for all $\xi , D>0$ , we have
for large enough $N\geq N_0(\xi ,D)$ . In this case, we use the notation $X\prec Y$ or
. We also use the convention that $\xi>0$ denotes an arbitrary small constant that is independent of N.
Finally, for positive quantities $f,g$ we write $f\lesssim g$ and $f\sim g$ if $f \le C g$ or $c g\le f\le Cg$ , respectively, for some constants $c,C>0$ that depend only on the constants appearing in the moment condition; see equation (2.2) later.
2 Main results
Assumption 1. We say that $W=W^{\ast }\in \mathbf {C}^{N\times N}$ is a real symmetric/complex hermitian Wignermatrix if the entries $(w_{ab})_{a\le b}$ in the upper triangular part are independent and satisfy
for some real random variable $\chi _{\mathrm {d}}$ and some real/complex random variable $\chi _{\mathrm {od}}$ of mean $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}=\operatorname {\mathbf {E}} \chi _{\mathrm {od}}=0$ and variances , $\operatorname {\mathbf {E}}\chi _{\mathrm {od}}^2=0$ , $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}^2=1$ in the complex, and , $\operatorname {\mathbf {E}} \chi _{\mathrm {d}}^2=2$ in the real case.Footnote ^{3} We furthermore assume that for every $ n\ge 3$ ,
for some constant $C_n$ ; in particular, all higherorder cumulants $\kappa _n^{\mathrm {d}},\kappa _n^{\mathrm {od}}$ of $\chi _{\mathrm {d}}, \chi _{\mathrm {od}} $ are finite for any n.
Our results hold for both symmetry classes, but for definiteness, we prove the main results in the real case, the changes for the complex case being minimal.
For a spectral parameter $z\in \mathbf {C}$ with , the resolvent $G=G(z)=(Wz)^{1}$ of a $N\times N$ Wigner matrix W is well approximated by a constant multiple $m\cdot I$ of the identity matrix, where $m=m(z)$ is the Stieltjes transform of the semicircular distribution $\sqrt {4x^2}/(2\pi )$ and satisfies the equation
We set $\rho (z): = \Im m(z)$ , which approximates the density of eigenvalues near $\Re z$ in a window of size $\eta $ .
We first recall the classical local law for Wigner matrices in both its tracial and isotropic forms [Reference Erdős, Schlein and Yau27, Reference Erdős, Yau and Yin29, Reference He, Knowles and Rosenthal34, Reference Knowles and Yin38]:
Theorem 2.1. Fix any $\epsilon>0$ ; then it holds that
uniformly in any deterministic vectors $\boldsymbol {x}, \boldsymbol {y}$ and spectral parameter z with and $\Re z\in \mathbf {R}$ , where .
Our main result is the following optimal multiresolvent local law with HilbertSchmidt norm error terms. Compared to Theorem 2.1, we formulate the bound only in an averaged sense since, due to the HilbertSchmidt norm in the error term, the isotropic bound is a special case with one of the traceless matrices being a centred rankone matrix; see Corollary 2.4.
Theorem 2.2 (Averaged multiresolvent local law).
Fix $\epsilon>0$ , let $k\ge 1$ , and consider $z_1,\ldots ,z_{k}\in \mathbf {C}$ with $N\eta \rho \ge N^{\epsilon }$ , for , and let $A_1,\ldots ,A_k$ be deterministic traceless matrices . Set $G_i:= G(z_i)$ and $m_i:= m(z_i)$ for all $i\le k$ . Then we have the local law on optimal scaleFootnote ^{4}
Remark 2.3. We also obtain generalisations of Theorem 2.2, where each G may be replaced by a product of Gs and ; see Lemma 3.1 later.
Due to the HilbertSchmidt sense of the error term, we obtain an isotropic variant of Theorem 2.2 as an immediate corollary by choosing in equation (2.5).
Corollary 2.4 (Isotropic local law).
Under the setup and conditions of Theorem 2.2, for any vectors $\boldsymbol {x},\boldsymbol {y}$ , it holds that
We now compare Theorem 2.2 to the previous result [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5], where an error term was proven for equation (2.5). For clarity, we focus on the really interesting $d<10$ regime.
Remark 2.5. For $k=1$ , our new estimate for traceless A,
is strictly better than the one in [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5], since always holds, but can be much smaller than for small rank A. In addition, equation (2.7) features an additional factor $\sqrt {\rho }\lesssim 1$ that is considerably smaller than 1 near the spectral edges.
For larger $k\ge 2$ , the relationship depends on the relative size of the HilbertSchmidt and operator norm of the $A_i$ s as well as on the size of $\eta $ . We recall [Reference Rudelson and Vershynin46] that the numerical rank of A is defined as and say that A is $\alpha $ mesoscopic for some $\alpha \in [0,1]$ if $r(A)=N^{\alpha }$ . If for some $k\ge 2$ all $A_i$ are $\alpha $ mesoscopic, then Theorem 2.2 improves upon [Reference Cipolloni, Erdős and Schröder18, Theorem 2.5] whenever $\eta \ll N^{( 1 \alpha k) /(k1)}$ .
Local laws on optimal scales can give certain information on eigenvectors as well. Let $\lambda _1\le \lambda _2 \le \ldots \le \lambda _N$ denote the eigenvalues and $\{ {\boldsymbol u}_i\}_{i=1}^N$ the corresponding orthonormal eigenvectors of W. Already the singleresolvent isotropic local law given by equation (2.4) implies the eigenvector delocalisation: that is, that $\ {\boldsymbol u_i} \_{\infty } \prec N^{1/2}$ . More generally,Footnote ^{5} : that is, eigenvectors behave as completely random unit vectors in the sense of considering their rank $1$ projections onto any deterministic vector ${\boldsymbol x}$ . This concept can be greatly extended to arbitrary deterministic observable matrix A, leading to the following results motivated both by thermalisation ideas from physics [Reference D’Alessio, Kafri, Polkovnikov and Rigol21, Reference Deutsch22, Reference Eckhardt, Fishman, Keating, Agam, Main and Müller23, Reference Feingold and Peres30] and by quantum (unique) ergodicity (QUE) in mathematics [Reference Anantharaman and Le Masson2, Reference Anantharaman and Sabri3, Reference Bauerschmidt, Huang and Yau4, Reference Bauerschmidt, Knowles and Yau5, Reference Colin de Verdière20, Reference Luo and Sarnak41, Reference Marklof and Rudnick44, Reference Rudnick and Sarnak47, Reference Snirelman48, Reference Soundararajan49, Reference Zelditch54, Reference Zelditch55].
Theorem 2.6 (Eigenstate thermalisation hypothesis).
Let W be a Wigner matrix satisfying Assumption 1, and let $\delta>0$ . Then for any deterministic matrix A and any bulk indices $i,j\in [\delta N,(1\delta )N]$ , it holds that
where is the traceless part of A.
Remark 2.7.

1. The result given by equation (2.8) was established in [Reference Cipolloni, Erdős and Schröder15] with replaced by uniformly in the spectrum (i.e., also for edge indices).

2. For rank $1$ matrices $A=\boldsymbol {x}\boldsymbol {x}^{\ast }$ , the bound given by equation (2.8) immediately implies the complete delocalisation of eigenvectors in the form .
Theorem 2.6 directly follows from the bound
that is obtained by the spectral decomposition of both resolvents and the wellknown eigenvalue rigidity, with some explicit $\delta $ dependent constants $C_{\delta }$ and $\epsilon =\epsilon (\delta )>0$ (see [Reference Cipolloni, Erdős and Schröder15, Lemma 1] for more details). The righthand side can be directly estimated using equation (2.5); and finally, choosing $\eta = N^{1+\xi }$ for any small $\xi>0$ gives equation (2.8) and thus proves Theorem 2.6.
The next question is to establish a central limit theorem for the diagonal overlap in equation (2.8).
Theorem 2.8 (Central limit theorem in the QUE).
Let W be a real symmetric ( $\beta =1$ ) or complex Hermitian ( $\beta =2$ ) Wigner matrix satisfying Assumption 1. Fix small $\delta ,\delta '>0$ , and let $A=A^*$ be a deterministic $N\times N$ matrix with . In the real symmetric case, we also assume that $A\in \mathbf {R}^{N\times N}$ is real. Then for any bulk index $i\in [\delta N, (1\delta ) N]$ , we have a central limit theorem
with $\mathcal {N}$ being a standard real Gaussian random variable. Moreover, for any moment, the speed of convergence is explicit (see equation (B.5)).
We require that in order to ensure that the spectral distribution of $\mathring {A}$ is not concentrated to a finite number eigenvalues: that is, that $\mathring {A}$ has effective rank $\gg 1$ . Indeed, the statement in equation (2.9) does not hold for finiterank As: for example, if $A=\mathring {A}=\mathbf {e}_x\rangle \langle {\boldsymbol e}_x{\boldsymbol e}_y\rangle \langle \mathbf {e}_y$ , for some $x\ne y\in [N]$ , then , which is the difference of two asymptotically independent $\chi ^2$ distributed random variables (e.g., see [Reference Bourgade and Yau10, Theorem 1.2]). More generally, the joint distribution of finitely many eigenvectors overlaps has been identified in [Reference Aggarwal, Lopatto and Marcinek1, Reference Bourgade and Yau10, Reference Bourgade, Huang and Yau11, Reference Marcinek43] for various related ensembles.
3 Proof of Theorem 2.2
In this section, we prove Theorem 2.2 in the critical $d<10$ regime. The $d\ge 10$ regime is handled similarly, but the estimates are much simpler; the necessary modifications are outlined in Appendix A.
In the subsequent proof, we will often assume that a priori bounds, with some control parameters $\psi _K^{\mathrm {av}/\mathrm {iso}}\ge 1$ , of the form
for certain indices $K\ge 0$ have been established uniformlyFootnote ^{6} in deterministic traceless matrices $\boldsymbol A=(A_1,\ldots ,A_K)$ , deterministic vectors $\boldsymbol {x}, \boldsymbol {y}$ and spectral parameters $\boldsymbol z=(z_1,\ldots ,z_K)$ with $N\eta \rho \ge N^{\epsilon }$ . We stress that we do not assume the estimates to be uniform in K. Note that $\psi _0^{\mathrm {av}}$ is defined somewhat differently from $\psi _K^{\mathrm {av}}$ , $K\ge 1$ , but the definition of $\psi ^{\mathrm {iso}}_K$ is the same for all $K\ge 0$ . For intuition, the reader should think of the control parameters as essentially orderone quantities; in fact, our main goal will be to prove this fact. Note that by Theorem 2.1, we may set $\psi _0^{\mathrm {av}/\mathrm {iso}}=1$ .
As a first step, we observe that equations (3.1), (3.2) and (3.3) immediately imply estimates on more general averaged resolvent chains and isotropic variants.
Lemma 3.1. (i) Assuming equations (3.1) and (3.3) for $K=0$ holds uniformly in $z_1$ , then for any $z_1,\ldots ,z_l$ with $N\eta \rho \ge N^{\epsilon }$ , it holds that
where $m[z_1,\ldots ,z_l]$ stands for the lth divided difference of the function $m(z)$ from equation (2.3), explicitly
(ii) Assuming for some $k\ge 1$ the estimates given by equations (3.2) and (3.3) for $K=k$ have been established uniformly, then for $\mathcal {G}_j:=G_{j,1}\cdots G_{j,l_j}$ with , traceless matrices $A_i$ and , $\rho := \max _{j,i} \rho (z_{j,i})$ , it holds that
where
and $g_{j,i}(x)=(xz_{j,i})^{1}$ or , depending on whether $G_{j,i}=G(z_{j,i})$ or .
Proof. Analogous to [Reference Cipolloni, Erdős and Schröder18, Lemma 3.2].
The main result of this section is the following hierarchy of master inequalities.
Proposition 3.2 (Hierarchy of master inequalities).
Fix $k\ge 1$ , and assume that equations (3.2) and (3.3) have been established uniformly in $\boldsymbol A$ and $\boldsymbol z$ with $N\eta \rho \ge N^{\epsilon }$ for all $ K\le 2k$ . Then it holds that
with the definition
where the sum is taken over an arbitrary number of nonnegative integers $k_i$ , with $k_i\ge 1$ for $i\ge 3$ , under the condition that their sum does not exceed k (in the case of only one nonzero $k_1$ , the second factor and product in equation (3.10) are understood to be one and $\Phi _0=1$ ).
This hierarchy has the structure that each $\Psi ^{\mathrm {av}/\mathrm {iso}}_k$ is estimated partly by $\psi $ s with an index higher than k, which potentially is uncontrollable even if the coefficient of the higherorder terms is small (recall that $1/(N\eta )$ and $1/(N\eta \rho )$ are small quantities). Thus the hierarchy must be complemented by another set of inequalities that estimate higherindexed $\Psi $ s with smallerindexed ones even at the expense of a large constant. The success of this scheme eventually depends on the relative size of these small and large constants, so it is very delicate. We prove the following reduction inequalities to estimate the $\psi _l^{\mathrm {av}/\mathrm {iso}}$ terms with $k+1\le l\le 2k$ in equations (3.8) and (3.9) by $\psi $ s with indices smaller than or equal to k.
Lemma 3.3 (Reduction lemma).
Fix $1\le j\le k$ , and assume that equations (3.2) and (3.3) have been established uniformly for $K\le 2k$ . Then it holds that
and for even k also that
The rest of the present section is structured as follows: in Section 3.1, we prove equation (3.8), and in Section 3.2, we prove equation (3.9). Then, in Section 3.3, we prove Lemma 3.3 and conclude the proof of Theorem 2.2. Before starting the main proof, we collect some trivial estimates between HilbertSchmidt and operator norms using matrix Hölder inequalities.
Lemma 3.4. For $N\times N$ matrices $B_1,\ldots ,B_k$ and $k\ge 2$ , it holds that
and
In the sequel, we often drop the indices from $G,A$ ; hence we write $(GA)^k$ for $G_1A_1\ldots G_kA_k$ and assume without loss of generality that $A_i=A_i^{\ast }$ and . We also introduce the convention in this paper that matrices denoted by capital A letters are always traceless.
3.1 Proof of averaged estimate given by equation (3.8) in Proposition 3.2
We now identify the leading contribution of . For any matrixvalued function $f(W)$ , we define the second moment renormalisation, denoted by underlining, as
in terms of the directional derivative $\partial _{\widetilde W}$ in the direction of an independent GUEmatrix $\widetilde W$ . The motivation for the second moment renormalisation is that by Gaussian integration by parts, it holds that $\operatorname {\mathbf {E}} W f(W)=\widetilde {\operatorname {\mathbf {E}}}\widetilde W (\partial _{\widetilde W} f)(W)$ whenever W is a Gaussian random matrix of zero mean and $\widetilde W$ is an independent copy of W. In particular, it holds that $\operatorname {\mathbf {E}}\underline {Wf(W)}=0$ whenever W is a GUEmatrix, while $\operatorname {\mathbf {E}}\underline {Wf(W)}$ is small but nonzero for GOE or nonGaussian matrices. By concentration and universality, we expect that to leading order $Wf(W)$ may be approximated by $\widetilde {\operatorname {\mathbf {E}}} \widetilde W(\partial _{\widetilde W}f)(W)$ . Here the directional derivative $\partial _{\widetilde W}f$ should be understood as
In our application, the function $f(W)$ is always a (product of) matrix resolvents $G(z)=(Wz)^{1}$ and possibly deterministic matrices $A_i$ . This time, we view the resolvent as a function of W, $G(W)= (Wz)^{1}$ for any fixed z. By the resolvent identity, it follows that
while the expectation of a product of GUEmatrices acts as an averaged trace in the sense
where I denotes the identity matrix and $(\Delta ^{ab})_{cd}:=\delta _{ac}\delta _{bd}$ . Therefore, for instance, we have the identities
Finally, we want to comment on the choice of renormalising with respect to an independent GUE rather than a GOE matrix. This is purely a matter of convenience, and we could equally have chosen the GOErenormalisation. Indeed, we have
and therefore, for instance,
which is a negligible difference. Our formulas below will be slightly simpler with our choice in equation (3.15), even though now $E\underline {W f(W)}$ is not exactly zero even for $W \sim \mathrm {GOE}$ .
Lemma 3.5. We have
where $\mathcal {E}_1^{\mathrm {av}}=0$ and
for $k\ge 3$ .
Proof. We start with the expansion
due to
where for $k=1$ the first two terms in the righthand side of equation (3.19) are not present. In the second step, we extended the underline renormalisation to the entire product $\underline {WG_1A_1G_2\cdots G_kA_k}$ at the expense of generating additional terms collected in the summation; this identity can be obtained directly from the definition given by equation (3.15). Note that in the first line of equation (3.19), we moved the term coming from of equation (3.20) to the lefthand side, causing the error $\mathcal {O}_{\prec }(\psi _0^{\mathrm {av}}/(N\eta ))$ . For $k\ge 2$ , using Lemmas 3.1 and 3.4, we estimate the second term in the second line of equation (3.19) by
For the first term in the second line of equation (3.19), we distinguish the cases $k=2$ and $k\ge 3$ . In the former, we write
where we used Lemma 3.4 to estimate
In case $k\ge 3$ , we estimate
Note that the leading deterministic term of was simply estimated as
From equation (3.24), we write , where the second term can simply be estimated as , due to Lemma 3.4, and included in the error term. Collecting all other error terms from equations (3.21) and (3.24) and recalling $\psi _j^{\mathrm {av}/\mathrm {iso}}\ge 1\gtrsim \sqrt {\rho /(N\eta )}$ for all j, we obtain equation (3.17) with the definition of $\mathcal {E}_k$ from equation (3.18).
Lemma 3.5 reduces understanding the local law to the underlined term in equation (3.19) since $\mathcal {E}_k^{\mathrm {av}}$ will be treated as an error term. For the underlined term, we use a cumulant expansion when calculating the high moment for any fixed integer p. Here we will again make a notational simplification, ignoring different indices in G, A and m; in particular, we may write
by choosing $G=G(\overline {z_i})$ for half of the factors.
We set $\partial _{ab}:= \partial /\partial w_{ab}$ as the derivative with respect to the $(a,b)$ entry of W: that is, we consider $w_{ab}$ and $w_{ba}$ as independent variables in the following cumulant expansion (such expansion was first used in the random matrix context in [Reference Khorunzhy, Khoruzhenko and Pastur35] and later revived in [Reference He and Knowles33, Reference Lee and Schnelli39]):
Technically, we use a truncated version of the expansion above; see, for example, [Reference Erdős, Krüger and Schröder26, Reference He and Knowles33]. We thus computeFootnote ^{7}
recalling Assumption 1 for the diagonal and offdiagonal cumulants. The summation runs over all indices $a,b\in [N]$ . The second cumulant calculation in equation (3.27) used the fact that by definition of the underline renormalisation the $\partial _{ba}$ derivative in the first line may not act on its own $(GA)^k$ .
For the first term of equation (3.27), we use due to equation (3.16) with $\widetilde W=\Delta ^{ab}$ so that using $G^t=G$ , we can perform the summation and obtain
from Lemma 3.1, estimating the deterministic leading term of by as in equation (3.25). The first prefactor in the righthand side of equation (3.28) is already written as the square of the target size $N^{k/21}\sqrt {\rho /(N\eta )}$ for ; see equation (2.5).
For the second term of equation (3.27), we estimate
recalling that $G=G^t$ since W is real symmetric.Footnote ^{8}
For the second line of equation (3.27), we define the set of multiindices $\boldsymbol l = (l_1, l_2, \ldots , l_n)$ with arbitrary length n, denoted by and total size $k=\sum _i l_i$ as
Note that the set $\mathcal {I}_k^{\mathrm {d}}$ is a finite set with cardinality depending only on $k,p$ . We distribute the derivatives according to the product rule to estimate
where for the multiset J, we define and set
Here, for the multiset $J\subset \mathcal {I}_k^{\mathrm {d}}$ , we defined its cardinality by $J$ and set . Along the product rule, the multiindex $\boldsymbol l$ encodes how the first factor $([(GA)^k]_{aa}$ in equation (3.30) is differentiated, while each element $\boldsymbol j\in J$ is a multiindex that encodes how another factor is differentiated. Note that $J$ is the number of such factors affected by derivatives; the remaining $p1J$ factors are untouched.
For the third line of equation (3.27), we similarly define the appropriate index set that is needed to encode the product ruleFootnote ^{9}
Note that in addition to the multiindex $\boldsymbol l$ encoding the distribution of the derivatives after the Leibniz rule similarly to the previous diagonal case, the second element $\boldsymbol \alpha $ of the new type of indices also keeps track of whether, after the differentiations, the corresponding factor is evaluated at $ab, ba, aa$ or $bb$ . While a single $\partial _{ab}$ or $\partial _{ba}$ acting on results in an offdiagonal term of the form $[(GA)^kG]_{ab}$ or $[(GA)^kG]_{ba}$ , a second derivative also produces diagonal terms. The derivative action on the first factor $[(GA)^k]_{ba} $ in the third line of equation (3.27) produces diagonal factors already after one derivative. The restriction in equation (3.31) that the number of $aa$  and $bb$ type diagonal elements must coincide comes from a simple counting of diagonal indices along derivatives: when an additional $\partial _{ab}$ hits an offdiagonal term, then either one $aa$ and one $bb$ diagonal are created or none. Similarly, when an additional $\partial _{ab}$ hits a diagonal $aa$ term, then one diagonal $aa$ remains, along with a new offdiagonal $ab$ . In any case, the difference between the $aa$ and $bb$ diagonals is unchanged.
Armed with this notation, similarly to equation (3.30), we estimate
where for the multiset $J\subset \mathcal {I}_k^{\mathrm {od}}$ , we define and set
Note that equation (3.33) is an overestimate: not all indices $(\boldsymbol j,\boldsymbol \beta )$ indicated in equation (3.34) can actually occur after the Leibniz rule.
Lemma 3.6. For any $k\ge 1$ , it holds that
By combining Lemma 3.5 and equations (3.27), (3.28), (3.30) and (3.33) with Lemma 3.6 and using a simple Hölder inequality, we obtain, for any fixed $\xi>0$ , that
where we used the $\Xi _k^{\mathrm {d}}$ term to add back the $a=b$ part of the summation in equation (3.33) compared to equation (3.27). By taking p large enough and $\xi $ arbitrarily small and using the definition of $\prec $ and the fact that the bound given by equation (3.36) holds uniformly in the spectral parameters and the deterministic matrices, we conclude the proof of equation (3.8).
Proof of Lemma 3.6.
The proof repeatedly uses equation (3.3) in the form
with $\boldsymbol e_b$ being the bth coordinate vector, where we estimated the deterministic leading term $m^k(A^k)_{ab}$ by using equation (3.14). Recalling the normalisation , the best available bound on is ; however, this can be substantially improved under a summation over the index b:
Using equations (3.37) and (3.38) for each entry of equations (3.31) and (3.34), we obtain the following naive (or a priori) estimates on $\Xi _k^{\mathrm {d/od}}$
where we defined
Note that $\Omega _k\le \Phi _k$ just by choosing $k_1=k_2=0$ in the definition of $\Phi _k$ , equation (3.10), and thus $\Omega _k/\sqrt {N}\lesssim \Phi _k \sqrt {\rho /(N\eta )}$ since $1\lesssim \rho /\eta $ . Hence equation (3.35) follows trivially from equation (3.40) for $\Xi _k^{\mathrm {d}}$ and $\Xi _k^{\mathrm {od}}$ whenever or , respectively: that is, when the exponent of N in equation (3.40) is nonpositive.
In the rest of the proof, we consider the remaining diagonal D1 and offdiagonal cases O1–O3 that we will define below. The cases are organised according to the quantity that captures by how many factors of $N^{1/2}$ the naive estimate given by equation (3.40) exceeds the target in equation (3.35) when all $\Phi $ s and $\psi $ s are set to be order one. Within case O1, we further differentiate whether an offdiagonal index pair $ab$ or $ba$ appears at least once in the tuple $\boldsymbol \alpha $ or in one of the tuples $\boldsymbol \beta $ . Within case O2, we distinguish according to the length of and as follows:

D1

O1

Ola $ab\vee ba \in \boldsymbol \alpha \cup \bigcup _{({\boldsymbol {j}},\boldsymbol \beta )\in J} \boldsymbol \beta $

Olb and : that is, and


O2

O2a ,

O2b , ,

O2c , , $l_1\ge 1$ ,

O2d , , $l_1= 0$ .


O3
The list of four cases above is exhaustive since by definition, and the subcases of O2 are obviously exhaustive. Within case O1, either some offdiagonal element appears in $\boldsymbol \alpha $ or some $\boldsymbol \beta $ (hence we are in case Ola), or the number of elements in $\boldsymbol \alpha $ and all $\boldsymbol \beta $ is even; compare to the constraint on the number of diagonal elements in equation (3.32). The latter case is only possible if , , which is case Olb (note that implies , and is impossible as it would imply , the number of elements in $\boldsymbol \alpha $ , is odd).
Now we give the estimates for each case separately. For case D1, using the restriction in the summation in equation (3.33) to get , we estimate
where we used the first inequalities of equations (3.37) and (3.38) for the $(GA)^k$ and one of the $(GA)^kG$ factors and the second inequality of equation (3.37) for the remaining factors, and in the last step, we used equation (3.39) and $\psi _k^{\mathrm {iso}}\sqrt {\rho /\eta }\gtrsim 1$ . Finally, we use Young’s inequality . This confirms equation (3.35) in case D1.
For the offdiagonal cases, we will use the following socalled Wardimprovements:

I1 Averaging over a or b in gains a factor of $\sqrt {\rho /(N\eta )}$ compared to equation (3.37).

I2 Averaging over a in gains a factor of $\sqrt {\rho /(N\eta )}$ compared to equation (3.38),
at the expense of replacing a factor of $(1+\psi _k^{\mathrm {iso}}\sqrt {\rho /(N\eta )})$ in the definition of $\Omega _k$ by a factor of $(1+\psi ^{\mathrm {iso}}_{2k}/\sqrt {N\eta \rho })^{1/2}$ . These latter replacements necessitate changing $\Omega _k$ to the larger $\Phi _k$ as a main control parameter in the estimates after Ward improvements. Indeed, I1 and I2 follow directly from equation (3.6) of Lemma 3.1 and , more precisely
where the first step in each case followed from a Schwarz inequality and summing up the indices explicitly. This improvement is essentially equivalent to using the Wardidentity $GG^*= \Im G/\eta $ in equation (3.43).
Now we collect these gains over the naive bound given in equation (3.40) for each case. Note that whenever a factor $\sqrt {\rho /(N\eta )}$ is gained, the additional $1/\sqrt {N}$ is freed up along the second inequality in equation (3.40) that can be used to compensate the positive Npowers.
For case O3, we have and estimate all but the first two $(\boldsymbol j,\boldsymbol \beta )$ factors in equation (3.34) trivially, using the last inequality in equation (3.37) to obtain
For the last two factors, we use the first inequality in equation (3.37) and then estimate as
where in the second step, we performed a Schwarz inequality for the double $a, b$ summation and used the last bound in equations (3.43), (3.39) and $1\lesssim \psi _k^{\mathrm {iso}}\sqrt { \rho /\eta }$ . Thus, we conclude
In case O2a, there exists some ${\boldsymbol {j}}$ with (recall that ). By estimating the remaining Jterms trivially by equation (3.37), we obtain
for some $j_1+j_2=k$ and double indices $\beta _1,\beta _2 \in \{ aa, bb, ab, ba\}$ . Here, in the second step, we assumed without loss of generality $j_1\ge 1$ (the case $j_2\ge 1$ being completely analogous) and used the first inequality in equation (3.37) for and the second inequality in equation (3.37) for . Finally, in the last step, we performed an $a,b$ Schwarz inequality, using the last bound in equations (3.43) and (3.39).
In case O2b, we have for all ${\boldsymbol {j}}$ since implies , and we estimate all but two Jfactors trivially by the last inequality in equation (3.37), the other two Jfactors (which are necessarily offdiagonal) by the first inequality in equation (3.37), the $l_1$ factor by the last inequality in equation (3.37) and the $l_2$ factor by the first inequality in equation (3.38) (note that $l_2\ge 1$ ) to obtain
where the last step used equation (3.39) and $\psi _k^{\mathrm {iso}} \sqrt {\rho /\eta }\gtrsim 1$ .
In case O2c, we use the first inequalities of equations (3.37) and (3.38) for the $l_1,l_2$ terms (since $l_1,l_2\ge 1$ ) and the first inequality of equation (3.37) for the $(GA)^kG$ factor to obtain
by equation (3.39).
In case O2d, we write the single $ G$ diagonal as
and use isotropic resummation for the leading m term into the $\boldsymbol 1=(1,1,\ldots )$ vector of norm
, that is,
and estimate
using the first inequalities of equations (3.37) and (3.38).
In case Ola, we use either I1 or I2, depending on whether the offdiagonal matrix is of the form $(GA)^lG$ or $(GA)^l$ , to gain one factor of $\sqrt {\rho /(N\eta )}$ in either case and conclude equation (3.35).
Finally, we consider case Olb, where there is no offdiagonal element to perform Wardimprovement, but for which, using equation (3.39), we estimate
for any exponents with $k_1+k_2=k_3+k_4=k$ . Here, in case $k_4>0$ , we used the second inequalities of equations (3.37) and (3.38) for the $k_2,k_4$ factors and the first inequality of equation (3.37) for the $k_1,k_3$ factors. The case $k_4=0$ is handled similarly, with the same result, by estimating $[(GA)^{k_3}G]_{aa}$ instead of $[(GA)^{k_4}G]_{bb}$ using the first inequality of equation (3.37).
3.2 Proof of the isotropic estimate given by equation (3.9) in Proposition 3.2
First we state the isotropic version of Lemma 3.5:
Lemma 3.7. For any deterministic unit vectors $\boldsymbol {x}, \boldsymbol {y}$ and $k\ge 0$ , we have
where $\mathcal {E}_0^{\mathrm {iso}}=0$ and for $k\ge 1$
Proof. From equation (3.20) applied to the first factor $G=G_1$ , similarly to equation (3.19), we obtain
where we used the definition in equation (3.3) for the first term and the definition in equation (3.15). An estimate analogous to equation (3.21) handles the sum and is incorporated in equation (3.53). This concludes the proof together with Lemma 3.1 and .
Exactly as in equation (3.27), we perform a cumulant expansion
recalling Assumption 1 for the diagonal and offdiagonal cumulants. In fact, the formula in equation (3.55) is identical to equation (3.27) for $k+1$ instead of k if the last $A=A_{k+1}$ in the product $(GA)^{k+1}= G_1 A_1G_2 A_2\ldots G_{k+1} A_{k+1}$ is chosen specifically $A_{k+1}= \boldsymbol {y}\boldsymbol {x}^*$ .
For the first line of equation (3.55), after performing the derivative, we can also perform the summations and estimate the resulting isotropic resolvent chains by using the last inequality of equation (3.37) as well as Lemma 3.1 to obtain
For the second line of equation (3.55), we estimate
For the third and fourth lines of equation (3.55), we distribute the derivatives according to the product rule to estimate (with the absolute value inside the summation to address both diagonal and offdiagonal terms)
where
and the summation in equation (3.57) is performed over all $\boldsymbol j=(j_0,\ldots ,j_n) \in \mathbf {N}_0^n$ with $j_0\ge 0$ , $j_1,\ldots ,j_n\ge 1$ and . Recall that $\sum {\boldsymbol j}=j_0+ j_1+ j_2+\ldots +j_n$ .
Lemma 3.8. For any admissible $\boldsymbol j$ in the summation of equation (3.57), it holds that
By combining Lemmas 3.7 and 3.8 and equations (3.56), (3.57) and (3.58), we obtain
concluding the proof of equation (3.9).
Proof of Lemma 3.8.
We recall the notations $\Omega _k,\Phi _k$ from equations (3.10) and (3.41). For a naive bound, we estimate all but the first factor trivially in equation (3.58) with
Note that the estimate is independent of the number of derivatives. For the first factor in equation (3.58), we estimate, after performing the derivatives, all but the last $[(GA)^{k_i}G]$ factor (involving $\boldsymbol y$ ) trivially by equation (3.37) as
By combining equations (3.61) and (3.62) and the Schwarzinequality
we conclude
which implies equation (3.59) in the case when $\sum \boldsymbol j\ge n+2$ using that $\Omega _k\le \Phi _k$ and $\rho /\eta \gtrsim 1$ . It thus only remains to consider the cases $\sum \boldsymbol j=n$ and $\sum \boldsymbol j=n+1$ .
If $\sum \boldsymbol j=n$ , then $n\ge 2$ and $j_0=0$ , $j_1=j_2=\cdots =1$ . By estimating the $j_2,j_3,\ldots $ factors in equation (3.58) using equation (3.61), we then bound
using and $\Omega _k\le \Phi _k$ , $1\lesssim \rho /\eta $ in the last step.
Finally, if $\sum \boldsymbol j=n+1$ , then $n\ge 1$ by admissibility and either $j_0=0$ or $j_1=1$ . In the first case, we estimate the $j_2,j_3,\ldots $ factors in equation (3.58) using equation (3.61) and all but the first $[(GA)^jG]_{\boldsymbol {x}\cdot }$ in the $j_1$ factor after differentiation trivially to obtain
again using a Schwarz inequality. Finally, in the $j_1=1$ case, we estimate two $j_0$ factor using equation (3.62), the $j_2,j_3,\ldots $ factors trivially and to bound
where we used the trivial bound for the in order to estimate the remaining terms by a Schwarz inequality. This completes the proof of the lemma.
3.3 Reduction inequalities and bootstrap
In this section, we prove the reduction inequalities in Lemma 3.3 and conclude the proof of our main result Theorem 2.2 showing that $\psi _k^{\mathrm {av}/\mathrm {iso}}\lesssim 1$ for any $k\ge 0$ .
Proof of Lemma 3.3.
The proof of this proposition is very similar to [Reference Cipolloni, Erdős and Schröder18, Lemma 3.6]; we thus present only the proof in the averaged case. Additionally, we only prove the case when k is even; if k is odd, the proof is completely analogous.
Define $T=T_k:=A(GA)^{k/21}$ , write $(GA)^{2k} = GTGTGTGT$ , and use the spectral theorem for these four intermediate resolvents. Then, using that $m_i\lesssim 1$ and that
, after a Schwarz inequality in the third line, we conclude that
We remark that to bound
in terms of $\psi _k^{\mathrm {av}}$ , we used (ii) of Lemma 3.1 together with $G^*(z) = G(\bar z)$ .
We are now ready to conclude the proof of our main result.
Proof of Theorem 2.2.
The proof repeatedly uses a simple argument called iteration. By this, we mean the following observation: whenever we know that $X\prec x$ implies
for some constants $B\ge N^{\delta }$ , $A,C>0$ and exponent $0<\alpha <1$ , and we know that $X\prec N^D$ initially (here $\delta , \alpha $ and D are Nindependent positive constants; other quantities may depend on N), then we also know that $X\prec x$ implies
The proof is simply to iterate equation (3.68) finitely many times (depending only on $\delta , \alpha $ and D). The fact that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec N^D$ follows by a simple norm bound on the resolvents and A, so the condition $X\prec N^D$ is always satisfied in our applications.
By the standard single resolvent local laws in equation (2.4), we know that $\psi _0^{\mathrm {av}}=\psi _0^{\mathrm {iso}}=1$ . Using the master inequalities in Proposition 3.2 and the reduction bounds from Lemma 3.3, in the first step, we will show that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{k/4}$ for any $k\ge 1$ as an a priori bound. Then, in the second step, we feed this bound into the tandem of the master inequalities, and the reduction bounds to improve the estimate to $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ . The first step is the critical stage of the proof; here we need to show that our bounds are sufficiently strong to close the hierarchy of our estimates to yield a better bound on $\Psi _k^{\mathrm {av}/\mathrm {iso}}$ than the trivial $\Psi _k^{\mathrm {av}/\mathrm {iso}}\le N^{k/2}\eta ^{k1}$ estimate obtained by using the norm bounds and . Once some improvement is achieved, it can be relatively easily iterated.
The proof of $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{k/4} $ proceeds by a steptwo induction: we first prove that $\Psi _k^{\mathrm {av},\mathrm {iso}}\prec \rho ^{k/4} $ for $k=1,2$ and then show that if $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{n/4} $ holds for all $n\le k2$ , for some $k\ge 4$ , then it also holds for $\Psi _{k1}^{\mathrm {av}/\mathrm {iso}}$ and $\Psi _{k}^{\mathrm {av}/\mathrm {iso}}$ .
Using equations (3.8)–(3.9), we have
for $k=1$ , using
Similarly, for $k=2$ , estimating explicitly
by Schwarz inequalities and plugging it into equations (3.8)–(3.9), we have
In these estimates, we frequently used that $\psi _k^{\mathrm {av}/\mathrm {iso}}\ge 1$ , $\rho \lesssim 1$ , $\rho /N\eta \le 1$ and $N\eta \rho \ge 1$ to simplify the formulas.
By equations (3.70)–(3.71), using iteration for the sum $\Psi _1^{\mathrm {av}}+\Psi _1^{\mathrm {iso}}$ , we readily conclude
Note that since equation (3.72) holds uniformly in the hidden parameters $A, z, \boldsymbol {x}, \boldsymbol {y}$ in $\Psi _1^{\mathrm {av}/\mathrm {iso}}$ , this bound serves as an upper bound on $\psi _1^{\mathrm {av}}+\psi _1^{\mathrm {iso}}$ (in the sequel, we will frequently use an already proven upper bound on $\Psi _k$ as an effective upper bound on $\psi _k$ in the next steps without explicitly mentioning it). Next, using this upper bound together with an iteration for $\Psi _2^{\mathrm {av}}+\Psi _2^{\mathrm {iso}}$ , we have from equation (3.71)
again after several simplifications by Young’s inequality and the basic inequalities $\psi _k^{\mathrm {av}/\mathrm {iso}}\ge 1$ , $\rho \lesssim 1$ and $N\eta \rho \ge 1$ .
We now apply the reduction inequalities from Lemma 3.3 in the form
where the first inequality was already inserted into the righthand side of equation (3.12) to get the second inequality in equation (3.74).
Then, inserting equations (3.74) and (3.72) into equation (3.73) and using iteration, we conclude
which together with equation (3.72) implies
We now proceed with a steptwo induction on k. The initial step of the induction is equation (3.76). Fix an even $k\ge 4$ , and assume that $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{n/4}$ holds for any $n\le k2$ . In this case, by substituting this bound for $\psi _n^{\mathrm {av}/\mathrm {iso}}$ whenever possible, for any even $l\le k$ , we have the following upper bounds on $\Phi _l$ and $\Phi _{l1}$ :
We now plug equation (3.77) into equations (3.8) and (3.9) and, again using the bound $\Psi _n^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{n/4}$ , $n\le k2$ , whenever possible, get
and
By iteration for $ \Psi _{k1}^{\mathrm {av}}+ \Psi _{k1}^{\mathrm {iso}}$ from equation (3.78), we thus get
where we used that $\mathfrak {E}_{k2}\le \mathfrak {E}_{k1}$ . Then using iteration for $ \Psi _{k}^{\mathrm {av}}+ \Psi _{k}^{\mathrm {iso}}$ from equation (3.79), we have
where we used that $\mathfrak {E}_{k1}\le \mathfrak {E}_k$ .
We will now use the reduction inequalities from Lemma 3.3 in the following form:
and
for any $j\le l/2$ , where $l\le k2$ is even. In the last step, we also used the last line of equation (3.82) to estimate $\psi _{2(l2j)}^{\mathrm {av}}$ . Then by plugging equation (3.83) into equation (3.77), we readily conclude that
for any $r\le k$ .
Plugging equation (3.84) into equations (3.80) and (3.81) and using iteration, we conclude
We will now additionally use that by equation (3.12) for any $r\in \{k1,k\}$ , we have
and that
for any $2\le j\le k1$ .
Plugging these bounds, together with equation (3.82) for $j=k1$ and $j=k$ , into equation (3.85), and using iteration first for $ \Psi _{k1}^{\mathrm {av}}+\Psi _{k1}^{\mathrm {iso}}$ and then for $ \Psi _{k}^{\mathrm {av}}+\Psi _{k}^{\mathrm {iso}}$ , we conclude that $\Psi _{k1}^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{(k1)/4}$ and that $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{k/4}$ . This completes the steptwo induction and hence the first and pivotal step of the proof.
In the second step, we improve the bounds $ \Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{k/4}$ to $ \Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ for all k. Recall that the definition of stochastic domination given by equation (1.8) involved an arbitrary small but fixed exponent $\xi $ . Now we fix this $\xi $ , a large exponent D, and fix an upper threshold K for the indices. Our goal is to prove that
where the supremum over all indicated parameters are meant in the sense described below equation (3.3).
Now we distinguish two cases in the supremum over the collection of spectral parameters $\boldsymbol z$ in equation (3.87). In the regime where $\rho =\rho (\boldsymbol z) \ge N^{\xi /K}$ , the bounds $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec \rho ^{k/4}$ , already proven for all k, imply equation (3.87). Hence we can work in the opposite regime where $\rho < N^{\xi /K}$ , and from now on, we restrict the supremum in equation (3.87) to this parameter regime. By plugging this bound into the master inequalities in Proposition 3.2 and noticing that $\Phi _k \le 1 + \rho ^{k/4} (N\eta \rho )^{1/4}$ , we directly conclude that
for any $k\ge 0$ . Now we can use this improved inequality by again plugging it into the master inequalities to achieve
and so on. Recalling the assumption that $N\eta \rho \ge N^{\epsilon }$ and recalling that $\rho \gtrsim \eta ^{1/2}\ge N^{1/3}$ , we need to iterate this process finitely many times (depending on k, $\xi , K, \epsilon $ ) to also achieve $\Psi _k^{\mathrm {av}/\mathrm {iso}}\prec 1$ in the second regime. This concludes the proof of the theorem.
4 Stochastic eigenstate equation and proof of Theorem 2.8
Armed with the new local law (Theorem 2.2) and its direct corollary on the eigenvector overlaps (Theorem 2.6), the rest of the proof of Theorem 2.8 is very similar to the proof of [Reference Cipolloni, Erdős and Schröder17, Theorem 2.2], which is presented in [Reference Cipolloni, Erdős and Schröder17, Sections 3 and 4]. For this reason, we only explain the differences and refer to [Reference Cipolloni, Erdős and Schröder17] for a fully detailed proof. We mention that the proof in [Reference Cipolloni, Erdős and Schröder17] relies heavily on the theory of the stochastic eigenstate equation initiated in [Reference Bourgade and Yau10] and then further developed in [Reference Bourgade, Yau and Yin12, Reference Marcinek43].
Similarly to [Reference Cipolloni, Erdős and Schröder17, Sections 34], we present the proof only in the real case (the complex case is completely analogous and so omitted). We will prove Theorem 2.8 dynamically: that is, we consider the Dyson Brownian motion (DBM) with initial condition W and show that the overlaps of the eigenvectors have Gaussian fluctuations after a time t slightly bigger than $N^{1}$ . With a separate argument in Appendix B, we show that the (small) Gaussian component added along the DBM flow can be removed at the price of a negligible error.
More precisely, we consider the matrix flow
where $\widetilde {B}_t$ is a standard real symmetric matrix Brownian motion (see, for example, [Reference Bourgade and Yau10, Definition 2.1]). We denote the resolvent of $W_t$ by $G=G_t(z):=(W_tz)^{1}$ , for $z\in \mathbf {C}\setminus \mathbf {R}$ . It is well known that in the limit $N\to \infty $ , the resolvent $G_t(z):=(W_tz)^{1}$ , for $z\in \mathbf {C}\setminus \mathbf {R}$ , becomes approximately deterministic and that its deterministic approximation is given by the scalar matrix $m_t\cdot I$ . The function $m_t=m_t(z)$ is the unique solution of the complex Burgers equation
with initial condition $m(z)=m_{\mathrm {sc}}(z)$ being the Stieltjes transform of the semicircular law. Denote $\rho _t=\rho _t(z):=\pi ^{1}\Im m_t(z)$ ; then it is easy to see that $\rho _t(x+\mathrm {i} 0)$ is a rescaling of $\rho _0=\rho _{\mathrm {sc}}$ by a factor $1+t$ . In fact, $W_t$ is a Wigner matrix itself, with a normalisation $\operatorname {\mathbf {E}}  (W_t)_{ab}^2 = N^{1}(1+t)$ with a Gaussian component.
Denote by $\lambda _1(t)\le \lambda _2(t)\le \dots \le \lambda _N(t)$ the eigenvalues of $W_t$ , and let $\{{\boldsymbol u}_i(t)\}_{i\in [N]}$ be the corresponding eigenvectors. Then it is known [Reference Bourgade and Yau10, Theorem 2.3] that $\lambda _i=\lambda _i(t)$ , ${\boldsymbol u}_i={\boldsymbol u}_i(t)$ are the unique strong solutions of the following system of stochastic differential equations:
where $B_t=(B_{ij})_{i,j\in [N]}$ is a standard real symmetric matrix Brownian motion (see, for example, [Reference Bourgade and Yau10, Definition 2.1]).
Note that the flow for the diagonal overlaps , by equation (4.4), naturally also depends on the offdiagonal overlap . Hence, even if we are only interested in diagonal overlaps, our analysis must also handle offdiagonal overlaps. In particular, this implies that there is no closed differential equation for only diagonal or only offdiagonal overlaps. However, in [Reference Bourgade, Yau and Yin12], Bourgade, Yau and Yin proved that the perfect matching observable $f_{{\boldsymbol \lambda },t}$ , which is presented in equation (4.6) below, satisfies a parabolic PDE (see equation (4.10) below). We now describe how the observable $f_{{\boldsymbol \lambda },t}$ is constructed.
4.1 Perfect matching observables
Without loss of generality for the rest of the paper, we assume that A is traceless, : that is, $A=\mathring {A}$ . We introduce the shorthand notation for the eigenvector overlaps
To compute the moments, we will consider monomials of eigenvector overlaps of the form $\prod _k p_{i_k j_k}$ , where each index occurs an even number of times. We start by introducing a particle picture and a certain graph that encode such monomials: each particle on the set of integers $[N]$ corresponds to two occurrences of an index i in the monomial product. This particle picture was introduced in [Reference Bourgade and Yau10] and heavily used in [Reference Bourgade, Yau and Yin12, Reference Marcinek43]. Each particle configuration is encoded by a function ${\boldsymbol \eta }:[N] \to \mathbf {N}_0$ , where $\eta _j:={\boldsymbol \eta }(j)$ denotes the number of particles at the site j and $n({\boldsymbol \eta }):=\sum _j \eta _j= n$ is the total number of particles. We denote the space of nparticle configurations by $\Omega ^n$ . Moreover, for any index pair $i\ne j\in [N]$ , we define ${\boldsymbol \eta }^{ij}$ to be the configuration obtained moving a particle from site i to site j; if there is no particle in i, then ${\boldsymbol \eta }^{ij}:={\boldsymbol \eta }$ .
We now define the perfect matching observable (introduced in [Reference Bourgade, Yau and Yin12]) for any given configuration ${\boldsymbol \eta }$ :
with n being the number of particles in the configuration ${\boldsymbol \eta }$ . Here $\mathcal {G}_{\boldsymbol \eta }$ denotes the set of perfect matchings on the complete graph with vertex set
and
where $e=\{(i_1,a_1),(i_2,a_2)\}\in \mathcal {V}_{\boldsymbol \eta }^2$ , and $\mathcal {E}(G)$ denotes the edges of G. Note that in equation (4.6), we took the conditioning on the entire flow of eigenvalues, ${\boldsymbol \lambda } =\{\boldsymbol \lambda (t)\}_{t\in [0,T]}$ for some fixed $T>0$ . From now on, we will always assume that $T\ll 1$ (even if not stated explicitly).
We always assume that the entire eigenvalue trajectory $\{\boldsymbol \lambda (t)\}_{t\in [0,T]}$ satisfies the usual rigidity estimate asserting that the eigenvalues are very close to the deterministic quantiles of the semicircle law with very high probability. To formalise it, we define
for any $\xi>0$ , where $\widehat {i}:=i\wedge (N+1i)$ . Here $\gamma _i(t)$ denote the quantiles of $\rho _t$ , defined by
where $\rho _t(x)= \frac {1}{2(1+t)\pi }\sqrt {(4(1+t)^2x^2)_+}$ is the semicircle law corresponding to $W_t$ . Note that $\gamma _i(t)\gamma _i(s)\lesssim ts$ for any bulk index i and any $t,s\ge 0$ .
The wellknown rigidity estimate (see, for example, [Reference Erdős, Knowles, Yau and Yin24, Theorem 7.6] or [Reference Erdős, Yau and Yin29]) asserts that
for any (small) $\xi>0$ and (large) $D>0$ . This was proven for any fixed t: for example, in [Reference Erdős, Knowles, Yau and Yin24, Theorem 7.6] or [Reference Erdős, Yau and Yin29], the extension to all t follows by a grid argument together with the fact that ${\boldsymbol \lambda }(t)$ is stochastically $1/2$ Hölder in t, which follows by Weyl’s inequality
with $s\le t$ and $U_1,U_2$ being independent GUE/GOE matrices that are also independent of W.
By [Reference Bourgade, Yau and Yin12, Theorem 2.6], we know that the perfect matching observable $f_{{\boldsymbol \lambda },t}$ is a solution of the following parabolic discrete PDE
where
Note that the number of particles $n=n({\boldsymbol \eta })$ is preserved under the flow of equation (4.10). The eigenvalue trajectories are fixed in this proof; hence we will often omit ${\boldsymbol \lambda }$ from the notation: for example, we will use $f_t=f_{{\boldsymbol \lambda }, t}$ , and so on.
The main technical input in the proof of Theorem 2.8 is the following result (compare to [Reference Cipolloni, Erdős and Schröder17, Proposition 3.2]):
Proposition 4.1. For any $n\in \mathbf {N}$ , there exists $c(n)>0$ such that for any $\epsilon>0$ , and for any $T\ge N^{1+\epsilon }$ , it holds
with very high probability, where the supremum is taken over configurations ${\boldsymbol \eta } \in \Omega ^n $ supported in the bulk: that is, such that $\eta _i=0$ for $i\notin [\delta N, (1\delta ) N]$ , with $\delta>0$ from Theorem 2.8. The implicit constant in equation (4.13) depends on n, $\epsilon $ , $\delta $ .
We are now ready to prove Theorem 2.8.
Proof of Theorem 2.8.
Fix $i\in [\delta N,(1\delta ) N]$ . Then the convergence in equation (2.9) follows immediately from equation (4.13), choosing ${\boldsymbol \eta }$ to be the configuration with $\eta _i=n$ and all other $\eta _j=0$ , together with a standard application of the Green function comparison theorem (GFT), relating the eigenvectors/eigenvalues of $W_T$ to those of W; see Appendix B, where we recall the GFT argument for completeness. We defer the interested reader to [Reference Cipolloni, Erdős and Schröder17, Proof of Theorem 2.2] for a more detailed proof.
4.2 DBM analysis
Since the current DBM analysis of equation (4.10) heavily relies on [Reference Cipolloni, Erdős and Schröder17, Section 4], before starting it, we introduce an equivalent representation of equation (4.6) used in [Reference Cipolloni, Erdős and Schröder17] (which itself is based on the particles representation from [Reference Marcinek43]).
Fix $n\in \mathbf {N}$ , and consider configurations ${\boldsymbol \eta }\in \Omega ^n$ : that is, such that $\sum _j\eta _j=n$ . We now give an equivalent representation of equations (4.10) and (4.11) that is defined on the $2n$ dimensional lattice $[N]^{2n}$ instead of configurations of n particles (see [Reference Cipolloni, Erdős and Schröder17, Section 4.1] for a more detailed description). Let ${\boldsymbol x}\in [N]^{2n}$ , and define the configuration space
where
for all $i\in \mathbf {N}$ .
The correspondence between these two representations is given by
Note that ${\boldsymbol x}$ uniquely determines ${\boldsymbol \eta }$ , but ${\boldsymbol \eta }$ determines only the coordinates of ${\boldsymbol x}$ as a multiset and not its ordering. Let $\phi \colon \Lambda ^n\to \Omega ^n$ , $\phi ({\boldsymbol x})={\boldsymbol \eta }$ be the projection from the ${\boldsymbol x}$ configuration space to the ${\boldsymbol \eta }$ configuration space using equation (4.16). We will then always consider functions g on $[N]^{2n}$ that are pushforwards of some function f on $\Omega ^n$ , $g= f\circ \phi $ : that is, they correspond to functions on the configurations
In particular, g is supported on $\Lambda ^n$ , and it is equivariant under permutation of the arguments: that is, it depends on ${\boldsymbol x}$ only as a multiset. We thus consider the observable
where $ f_{{\boldsymbol \lambda },t}$ was defined in equation (4.6).
Using the ${\boldsymbol x}$ representation space, we can now write the flow of equations (4.10) and (4.11) as follows:
where
with ${\boldsymbol e}_a(c)=\delta _{ac}$ , $a,c\in [2n]$ . This flow is a map of functions defined on $\Lambda ^n\subset [N]^{2n}$ , and it preserves equivariance.
We now define the scalar product and the natural measure on $\Lambda ^n$ :
as well as the norm on $L^p(\Lambda ^n)$ :
By [Reference Marcinek43, Appendix A.2], it follows that the operator $\mathcal {L}=\mathcal {L}(t)$ is symmetric with respect to the measure $\pi $ , and it is a negative operator on $L^2(\Lambda ^n)$ with Dirichlet form
Let $\mathcal {U}(s,t)$ be the semigroup associated to $\mathcal {L}$ : that is, for any $0\le s\le t$ , it holds
4.2.1 Shortrange approximation
Most of our DBM analysis will be completely local; hence we will introduce a shortrange approximation $h_t$ (see its definition in equation (4.26) below) of $g_t$ that will be exponentially small, evaluated on ${\boldsymbol x}$ s that are not fully supported in the bulk.
Recall the definition of the quantiles $\gamma _i(0)$ from equation (4.9). Then we define the sets
which correspond to indices and spectral range in the bulk, respectively. From now on, we fix a point ${\boldsymbol y}\in \mathcal {J}$ and an Ndependent parameter K such that $1\ll K\le \sqrt {N}$ . Next, we define the averaging operator as a simple multiplication operator by a ‘smooth’ cutoff function:
with . Additionally, fix an integer $\ell $ with $1\ll \ell \ll K$ , and define the shortrange coefficients
where $c