Hostname: page-component-797576ffbb-pxgks Total loading time: 0 Render date: 2023-12-09T20:43:20.599Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "useRatesEcommerce": true } hasContentIssue false

A set of 2-recurrence whose perfect squares do not form a set of measurable recurrence

Published online by Cambridge University Press:  04 September 2023

Department of Applied Mathematics and Statistics, Colorado School of Mines, Golden, Colorado, USA
Rights & Permissions [Opens in a new window]


We say that $S\subseteq \mathbb Z$ is a set of k-recurrence if for every measure-preserving transformation T of a probability measure space $(X,\mu )$ and every $A\subseteq X$ with $\mu (A)>0$, there is an $n\in S$ such that $\mu (A\cap T^{-n} A\cap T^{-2n}\cap \cdots \cap T^{-kn}A)>0$. A set of $1$-recurrence is called a set of measurable recurrence. Answering a question of Frantzikinakis, Lesigne, and Wierdl [Sets of k-recurrence but not (k+1)-recurrence. Ann. Inst. Fourier (Grenoble) 56(4) (2006), 839–849], we construct a set of $2$-recurrence S with the property that $\{n^2:n\in S\}$ is not a set of measurable recurrence.

Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
© The Author(s), 2023. Published by Cambridge University Press

1 Background and motivation

A probability measure-preserving system (or MPS) is a quadruple $(X,\mathcal B,\mu ,T)$ where $(X,\mathcal B,\mu )$ is a probability measure space and $T:X\to X$ is an invertible transformation preserving $\mu $ , meaning $\mu (T^{-1}A)=\mu (A)$ for every measurable set $A\subseteq X$ .

We say that $S\subseteq \mathbb Z$ is a set of measurable recurrence if for every MPS $(X,\mathcal B,\mu ,T)$ and every $A\subseteq X$ having $\mu (A)>0$ , there is an $n\in S$ such that $\mu (A\cap T^{-n}A)>0$ .

For a fixed $k\in \mathbb N$ , we say S is a set of k-recurrence if under these hypotheses, there is an $n\in S$ such that $\mu (\bigcap _{j=0}^{k} T^{-jn}A)>0$ ; in this terminology, a set of measurable recurrence is a set of $1$ -recurrence.

Finally, $S\subseteq \mathbb Z$ is a set of Bohr recurrence if for all $d\in \mathbb N$ , every $\boldsymbol {\alpha } \in \mathbb T^d$ , and all $\varepsilon>0$ , there is an $n\in S$ such that $\|n\boldsymbol {\alpha }\|<\varepsilon $ (see §3 for definitions and notation).

Frantzikinakis, Lesigne, and Wierdl [Reference Frantzikinakis, Lesigne and Wierdl10] proved that if $k\in \mathbb N$ and $S\subseteq \mathbb Z$ is a set of k-recurrence, then $S^{\wedge k}:=\{n^k: n\in S\}$ is a set of Bohr recurrence. They ask (the remarks following [Reference Frantzikinakis, Lesigne and Wierdl10, Proposition 2.2]) whether this conclusion can be strengthened to ‘ $S^{\wedge k}$ is a set of measurable recurrence,’ and the subsequent articles [Reference Frantzikinakis7, Reference Frantzikinakis8] reiterate ([Reference Frantzikinakis8, Problem 5] of the current version at arXiv:1103.3808) this question. Our main result, Theorem 1.1, provides a negative answer for the case $k=2$ . For $k\geq 3$ , the question remains open. A related question in [Reference Frantzikinakis7] asks whether a set S which is a set of k-recurrence for every k must have the property that $S^{\wedge 2}$ is a set of measurable recurrence. We discuss how our construction relates to these questions in §16.

Theorem 1.1. There is a set $S\subseteq \mathbb Z$ which is a set of $2$ -recurrence such that $S^{\wedge 2}$ is not a set of measurable recurrence.

Reflecting on the known examples of sets of Bohr recurrence which are not sets of measurable recurrence, Frantzikinakis [Reference Frantzikinakis8] predicts that an example of a set of $2$ -recurrence S where $S^{\wedge 2}$ is not a set of measurable recurrence will be rather complicated. Our example is indeed complicated: while built from well-known constituents using standard methods, the proof that it is a set of $2$ -recurrence uses several reductions—from general measure-preserving systems to totally ergodic systems to nilsystems to affine systems to Kronecker systems. The final reduction combines explicit computations of multiple ergodic averages in 2-step affine systems with classical estimates for three term arithmetic progressions in terms of Fourier coefficients.

1.1 Outline of the article

Our approach is similar to Kriz’s construction [Reference Kříž18] proving that there is a set of topological recurrence which is not a set of measurable recurrence. Very roughly, our example S in Theorem 1.1 is $\{n:n^2 \in R\}$ , where R is Kriz’s example. While this description is not quite correct, it may help those familiar with [Reference Kříž18], [Reference Griesmer16] or [Reference Griesmer15] understand our construction.

The overall proof of Theorem 1.1 is presented at the end of §2. We outline its components here. Section 2 begins by collecting standard facts about the following finite approximations to recurrence properties.

Definition 1.2. Let $S\subseteq \mathbb Z$ and $k\in \mathbb N$ . We say that S is $(\delta ,k)$ -recurrent if for every MPS $(X,\mathcal B,\mu ,T)$ and every $A\subseteq X$ with $\mu (A)>\delta $ , we have $A\cap T^{-n}A\cap \cdots \cap T^{-kn}A\neq \varnothing $ for some $n\in S$ .

We say that S is $(\delta ,k)$ -non-recurrent if there is an MPS $(X,\mathcal B,\mu ,T)$ and $A\subseteq X$ with $\mu (A)>\delta $ such that $A\cap T^{-n}A\cap \cdots \cap T^{-kn}A=\varnothing $ .

We say S is $\delta $ -non-recurrent if it is $(\delta ,1)$ -non-recurrent, meaning there is an MPS $(X,\mathcal B,\mu ,T)$ and $A\subseteq X$ with $\mu (A)>\delta $ such that $A\cap T^{-n}A=\varnothing $ for all $n\in S$ .

Remark 1.3. The condition $A\cap T^{-n}A\cap \cdots \cap T^{-kn}A\neq \varnothing $ in the definition of $(\delta ,k)$ - recurrent may be replaced with $\mu (A\cap T^{-n}A\cap \cdots \cap T^{-kn}A)>0$ ; cf. Lemma 15.1.

Lemma 2.1 says that if $S_1, S_2\subseteq \mathbb Z$ are finite, $\delta _1$ -non-recurrent, and $\delta _2$ -non-recurrent, then for all sufficiently large m, $S_1\cup mS_2$ is $2\delta _1\delta _2$ -non-recurrent. Thus, if $S_1^{\wedge 2}$ and $S_2^{\wedge 2}$ are $\delta _1$ -non-recurrent and $\delta _2$ -non-recurrent, respectively, then $(S_1 \cup mS_2)^{\wedge 2}$ is $2\delta _1\delta _2$ -non-recurrent for all sufficiently large m, as $(S_1 \cup mS_2)^{\wedge 2} = S_1^{\wedge 2} \cup m^2S_2^{\wedge 2}$ .

Lemma 2.3 says that $S\subseteq \mathbb Z$ is $\delta $ -non-recurrent if and only if for all $\delta '< \delta $ and all finite subsets $S'\subseteq S$ , $S'$ is $\delta '$ -non-recurrent. Likewise, if $S\subseteq \mathbb Z$ is $(\eta ,2)$ -recurrent, then for all $\eta '<\eta $ , there is a finite subset $S'\subseteq S$ which is $(\eta ',2)$ -recurrent.

The proof of Theorem 1.1 is given at the end of §2; it explains in detail how finite approximations are assembled to form a $2$ -recurrent set whose perfect squares do not form a set of measurable recurrence. This reduces the problem to proving Lemma 2.4, which states that the required finite approximations exist. These approximations are based on Bohr–Hamming balls, which we introduce in §3. Bohr–Hamming balls were used in [Reference Griesmer15, Reference Kříž18] to construct sets with prescribed recurrence properties. Fixing $\delta <\tfrac 12$ and $\eta>0$ , Lemmas 3.4 and 3.5 show that there is a Bohr–Hamming ball $BH$ which is $\delta $ -non-recurrent, while $\sqrt {BH}:=\{n\in \mathbb N: n^2\in BH\}$ is $(\eta ,2)$ -recurrent.

The proof of Lemma 3.5 occupies §§415. It is proved by estimating multiple ergodic averages of the form

(1.1) $$ \begin{align} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N g(n^2\boldsymbol{\beta}) \int f\cdot f\circ T^n \cdot f\circ T^{2n}\, d\mu, \end{align} $$

where $(X,\mathcal B,\mu ,T)$ is a measure-preserving system, $f:X\to [0,1]$ has $\int f\, d\mu>\delta $ for some prescribed $\delta>0$ , $\boldsymbol {\beta }\in \mathbb T^r$ for some $r\in \mathbb N$ , and $g:\mathbb T^r\to [0,1]$ is Riemann integrable. Under certain hypotheses on g, we will prove the limit in equation (1.1) is positive; this is the inequality in equation (4.7) in the proof of Lemma 3.5. In §4, we show how the general case may be reduced to that where T is totally ergodic. The remainder of the article, outlined in §5.2, is dedicated to analyzing the limit in equation (1.1) when T is totally ergodic. Section 8 shows that the totally ergodic case can be further reduced to the study of standard $2$ -step Weyl systems, and §§913 are dedicated to simplifying and estimating equation (1.1) for these systems.

Readers familiar with the theory of characteristic factors (especially [Reference Frantzikinakis6]) may find it most profitable to read §§2, 3, 5, and 8 in detail, and skim §4.

2 Constructing the example from finite approximations

We first require some standard facts about the properties mentioned in Definition 1.2. The following is [Reference Griesmer16, Lemma 3.6]; it is essentially [Reference Kříž18, Lemma 3.2]. Similar lemmas appear, often unnamed, in the variations on Kriz’s example [Reference Forrest5, Reference McCutcheon, Petersen and Salama20, Reference McCutcheon21, Reference Weiss25].

Lemma 2.1. Let $S_1, S_2\subseteq \mathbb N$ be finite. If $S_1$ and $S_2$ are $\delta $ -non-recurrent and $\eta $ -non-recurrent, respectively, then for all sufficiently large $m\in \mathbb N$ , $S_1\cup mS_2$ is $2\delta \eta $ -non-recurrent.

Lemma 2.2. Let $m\in \mathbb Z$ and $\delta \geq 0$ . If $S\subseteq \mathbb Z$ is $(\delta ,2)$ -recurrent, then $mS$ is also $(\delta ,2)$ -recurrent.

Proof. Fix $m\in \mathbb Z$ and let $S\subseteq \mathbb Z$ be a $(\delta ,2)$ -recurrent set. Let $(X,\mathcal B,\mu ,T)$ be an MPS, with $A\subseteq X$ having $\mu (A)>\delta $ . Consider the MPS $(X,\mathcal B,\mu ,T^m)$ . Since $\mu (A)>\delta $ , there exists $n\in S$ such that $\mu (A\cap (T^{m})^{-n}A \cap (T^{m})^{-2n}A)>0$ , meaning $\mu (A\cap T^{-mn}A\cap T^{-2(mn)}A)>0$ . Since $mn\in mS$ , this proves $mS$ is $(\delta ,2)$ -recurrent.

Our proof of Lemma 2.4 uses the following compactness properties for recurrence.

Lemma 2.3. Let $k\in \mathbb N$ and $\delta \geq 0$ . If $\delta '>\delta $ and every finite subset of S is $(\delta ',k)$ -non-recurrent, then S is $(\delta ,k)$ -non-recurrent.

Consequently, if S is $(\delta ,k)$ -recurrent, then for all $\delta '>\delta $ , there is a finite $S'\subseteq S$ which is $(\delta ',k)$ -recurrent.

We prove Lemma 2.3 in §15. A special case, which is easily adapted to prove the general case, appears in [Reference Forrest5, Ch. 2].

Theorem 1.1 is proved by combining the following lemma with the others in this section.

Lemma 2.4. For all $\delta>0$ and $\eta <1/2$ , there exists $S\subseteq \mathbb Z$ which is $(\delta ,2)$ -recurrent such that $S^{\wedge 2}$ is $\eta $ -non-recurrent.

By Lemma 2.3, we can take S to be finite in Lemma 2.4.

Lemmas 3.4 and 3.5 will prove Lemma 2.4; the proof of Lemma 3.5 forms the majority of this article.

Proof of Theorem 1.1

Let $\delta <\delta '<\tfrac 12$ . We will construct an increasing sequence of finite sets $S_1\subseteq S_2\subseteq \cdots $ so that $S_n$ is $({1}/{n},2)$ -recurrent, and $S_n^{\wedge 2}$ is $\delta '$ -non-recurrent. Setting $S:=\bigcup _{n=1}^\infty S_n$ , we get that S is a set of $2$ -recurrence, while every finite subset of $S^{\wedge 2}$ is $\delta '$ -non-recurrent. Lemma 2.3 then implies S is $\delta $ -non-recurrent.

To define $S_1$ , we apply Lemma 2.4 to find an $S_1\subseteq \mathbb Z$ which is $(1,2)$ -recurrent, while $S_1^{\wedge 2}$ is $\delta _1$ -non-recurrent for some $\delta _1>\delta '$ . We define the remaining $S_n$ inductively: suppose $n\in \mathbb N$ and that $S_n$ has been chosen to be $({1}/{n},2)$ -recurrent, while $S_n^{\wedge 2}$ is $\delta _n$ -non-recurrent for some $\delta _n>\delta '$ . Let $\eta <\tfrac 12$ so that $2\eta \delta _n>\delta '$ . We will find $S_{n+1}\supset S_n$ so that $S_{n+1}$ is $({1}/({n+1}),2)$ -recurrent and $S_{n+1}^{\wedge 2}$ is $2\eta \delta _n$ -non-recurrent. To do so, apply Lemma 2.4 to find a finite $R \subseteq \mathbb Z$ which is $({1}/({n+1}),2)$ -recurrent such that $R^{\wedge 2}$ is $\eta $ -non-recurrent. By Lemma 2.1, choose $m\in \mathbb N$ so that $(S_n^{\wedge 2})\cup m^2(R^{\wedge 2})$ is $2\eta \delta _n$ -non-recurrent. Now $S_{n+1}:= S_n \cup mR$ is the desired set: $mR$ is $({1}/({n+1}),2)$ -recurrent, by Lemma 2.2, while $S_{n+1}^{\wedge 2}= (S_n^{\wedge 2})\cup m^2(R^{\wedge 2})$ . Since $2\eta \delta _n> \delta '$ , this completes the inductive step of the construction.

3 Approximate Hamming balls in $\mathbb T^r$ and Bohr–Hamming balls in $\mathbb Z$

Let $\mathbb T$ denote the group $\mathbb R/\mathbb Z$ with the usual topology. For $x\in \mathbb T$ , let $\tilde {x}$ denote the unique element of $[0,1)$ such that $x = \tilde {x}+\mathbb Z$ and define $\|x\|:=\min \{|\tilde {x}-n|: n\in \mathbb Z\}$ . For $r\in \mathbb N$ and $\mathbf x = (x_1,\ldots , x_r)\in \mathbb T^r$ , let $\|\mathbf x\|:=\max _{j\leq r} \|x_j\|$ .

For $\varepsilon>0$ , $r\in \mathbb N$ , and $\mathbf x=(x_1,\ldots ,x_r)\in \mathbb T^r$ , let

$$ \begin{align*} w_\varepsilon(\mathbf x):= |\{j: \|x_j\|\geq \varepsilon\}|. \end{align*} $$

So $w_\varepsilon (\mathbf x)$ is the number of coordinates of $\mathbf x$ differing from $0$ by at least $\varepsilon $ .

Definition 3.1. For $k< r\in \mathbb N$ , $\mathbf y\in \mathbb T^r$ , and $\varepsilon>0$ , we define the approximate Hamming ball of radius $(k,\varepsilon )$ around $\mathbf y$ as

$$ \begin{align*} \operatorname{Hamm}(\mathbf y; k,\varepsilon):=\{\mathbf x\in \mathbb T^r: w_{\varepsilon}(\mathbf y - \mathbf x)\leq k\}. \end{align*} $$

So $\operatorname {Hamm}(\mathbf y; k,\varepsilon )$ is the set of $\mathbf x=(x_1,\ldots ,x_r)\in \mathbb T^r$ , where at most k coordinates $x_i$ differ from $y_i$ by at least $\varepsilon $ .

If Z is a topological abelian group, we say $\alpha \in Z$ generates Z if the cyclic subgroup $\{n\alpha :n\in \mathbb Z\}$ is dense in Z. In other words, $\alpha $ generates Z if Z is the smallest closed subgroup containing $\alpha $ .

The group rotation system $(Z,\mathcal B, m_Z,R_{\alpha })$ , where $\mathcal B$ is the Borel $\sigma $ -algebra on Z and $m_Z$ is Haar measure on Z, is given by $R_{\alpha }z=z+\alpha $ .

Definition 3.2. If $U=\operatorname {Hamm}(\mathbf y; k,\varepsilon ) \subseteq \mathbb T^r$ is an approximate Hamming ball and $\boldsymbol {\beta }\in \mathbb T^r$ , the corresponding Bohr–Hamming ball of radius $(k,\varepsilon )$ is

$$ \begin{align*} BH(\boldsymbol{\beta},\mathbf y;k,\varepsilon):=\{n\in \mathbb Z:n\boldsymbol{\beta}\in U\}. \end{align*} $$

If $\boldsymbol {\beta }$ generates $\mathbb T^r$ , we say that the corresponding Bohr–Hamming ball is proper.

We write m for Haar probability measure on $\mathbb T^r$ . Lemmas 3.3 and 3.4 here are implicit in [Reference Kříž18] and proved explicitly in [Reference Griesmer15].

Lemma 3.3. Let $k\in \mathbb N$ and $\eta <\tfrac 12$ . For all sufficiently large $r\in \mathbb N$ , there is an $\varepsilon>0$ and $E\subseteq \mathbb T^r$ with $m(E)>\eta $ such that $E\cap (E+U)=\varnothing $ , where $U=\operatorname {Hamm}(\mathbf y;k,\varepsilon )$ , with $\mathbf y = (\tfrac 12,\ldots , \tfrac 12)\in \mathbb T^r$ .

Lemma 3.3 is a consequence of [Reference Griesmer15, Lemma 7.1]. To derive the former from the latter, note that [Reference Griesmer15, Lemma 7.1] (in the case $p=2$ there) provides sets E, $E'\subseteq \mathbb T^r$ with $\mu (E)>\eta $ , an approximate Hamming ball U around $0_{\mathbb T^r}$ with radius $(k,\varepsilon )$ for some $\varepsilon>0$ , such that $E+U\subseteq E'$ and $E'+ (\tfrac 12,\ldots ,\tfrac 12)$ is disjoint from $E'$ .

Lemma 3.4. Let $k\in \mathbb N$ and $\eta <\tfrac 12$ . For all sufficiently large $r\in \mathbb N$ , there is an $\varepsilon>0$ such that for all $\boldsymbol {\beta }\in \mathbb T^r$ , the Bohr–Hamming ball $BH(\boldsymbol {\beta },\mathbf y;k,\varepsilon )$ is $\eta $ -non-recurrent, where $\mathbf y = (\tfrac 12,\ldots ,\tfrac 12)\in \mathbb T^r$ .

Proof. Let $\eta <\tfrac 12$ and choose r large enough to find the E and U provided by Lemma 3.3, with $m(E)>\eta $ . Let $(X,\mathcal B,\mu ,T) = (\mathbb T^r,\mathcal B,m,R_{\boldsymbol {\beta }})$ be the group rotation on $\mathbb T^r$ determined by $\boldsymbol {\beta }$ . For $n\in BH(\boldsymbol {\beta },\mathbf y;k,\eta )$ , we have $R_{\boldsymbol {\beta }}^n E \subseteq E+U$ , so $E\cap R_{\boldsymbol {\beta }}^n E=\varnothing .$ Since $R_{\boldsymbol {\beta }}$ is invertible, this means $E\cap R_{\boldsymbol {\beta }}^{-n}E =\varnothing $ , as well.

For $S\subseteq \mathbb Z$ , let $\sqrt {S}:=\{n\in \mathbb Z:n^2 \in S\}$ .

Lemma 3.5. For all $\delta>0$ , there exists $k_0\in \mathbb N$ such that for every $r\in \mathbb N$ , every proper Bohr–Hamming ball $BH:=BH(\boldsymbol {\beta },\mathbf y; k, \varepsilon )$ with $k\geq k_0$ , $\varepsilon>0$ , and $\mathbf y\in \mathbb T^r$ , $\sqrt {BH}$ is $(\delta ,2)$ -recurrent.

Lemma 3.5 is proved using multiple ergodic averages and characteristic factors. The main argument is given in §4, using several reductions developed in §§414.

Proof of Lemma 2.4

Let $\delta>0$ and $\eta <\tfrac 12$ . Choose k large enough to satisfy the conclusion of Lemma 3.5. With this k, choose $r>k$ and $\varepsilon $ small enough to satisfy the conclusion of Lemma 3.4. Let $\boldsymbol {\beta }\in \mathbb T^r$ be generating and let $BH=BH(\boldsymbol {\beta },\mathbf y;k,\varepsilon )$ , where $\mathbf y=(\tfrac 12,\ldots ,\tfrac 12)\in \mathbb T^r$ , so that $BH$ is $\eta $ -non-recurrent. Finally, let $S=\sqrt {BH}$ , so that S is $(\delta ,2)$ -recurrent, by Lemma 3.5. Since $S^{\wedge 2}\subseteq BH$ , we get that $S^{\wedge 2}$ is $\eta $ -non-recurrent, as desired.

3.1 Cylinders and Fourier coefficients

Here we define constituents of approximate Hamming balls.

Definition 3.6. Given $r\in \mathbb N$ , $I\subseteq \{1,\ldots ,r\}$ , $\eta>0$ , and $\mathbf y\in \mathbb T^r$ , define the $\eta $ -cylinder determined by I around $\mathbf y$ to be

$$ \begin{align*} V_{I,\mathbf y,\eta}:=\{\mathbf x\in \mathbb T^r : \|x_i-y_i\|<\eta \text{ for all } i \in I\},\end{align*} $$

so that

(3.1) $$ \begin{align} U:=\operatorname{Hamm}(\mathbf y;k,\eta) = \bigcup_{\substack{I\subseteq \{1,\ldots, r\}\\ |I| = r-k}} V_{I,\mathbf y, \eta}. \end{align} $$

We say that $g:\mathbb T\to \mathbb R$ is a cylinder function subordinate to U if $g={m(V)}^{-1}1_V$ , where V is one of the cylinders $V_{I,\mathbf y,\eta }$ in equation (3.1). Note that each cylinder function subordinate to U is supported on U.

Let $\mathcal S^1$ denote the circle group $\{z\in \mathbb C:|z|=1\}$ with the usual topology and the group operation of complex multiplication. If Z is a compact abelian group with Haar probability measure m, $\widehat {Z}$ denotes its Pontryagin dual, meaning $\widehat {Z}$ is the group of continuous homomorphisms $\chi :Z\to \mathcal S^1$ ; such homomorphisms are called characters of Z. Given $f:Z\to \mathbb C$ , its Fourier transform is $\hat {f}:\widehat {Z}\to \mathbb C$ given by $\hat {f}(\chi )=\int f \overline {\chi }\, dm$ .

For $s\in Z$ , let $f_s$ be the translate of f defined by $f_s(x):=f(x+s)$ . Then $\widehat {f_s}(\chi )=\chi (s)\hat {f}(\chi )$ for each $\chi \in \widehat {Z}$ .

As usual, for $f, g: Z\to \mathbb C$ , $f*g$ denotes their convolution, defined as $f*g(x):=\int f(t)g(x-t)\, dm(t)$ . We will use the standard identity $\widehat {f*g}=\hat {f}\hat {g}$ (the Fourier transform turns convolution into pointwise multiplication).

Letting $\|f\|:=(\int |f|^2\, dm)^{1/2}$ denote the $L^2(m)$ norm of f, we have the standard Plancherel identity in equation (3.2), which leads to the subsequent lemma,

(3.2) $$ \begin{align} \sum_{\chi\in \widehat{Z}} |\hat{f}(\chi)|^2 = \|f\|^2. \end{align} $$

Lemma 3.7. Let Z be a compact abelian group with Haar probability measure m and $f\in L^2(m)$ . If $\|f\|\leq 1$ and $|\hat {f}(\chi _1)|,\ldots , |\hat {f}(\chi _k)|$ are the k largest values of $|\hat {f}|$ , then $|\hat {f}(\chi )|< k^{-1/2}$ for all $\chi \in \widehat {Z}\setminus \{\chi _1,\ldots ,\chi _k\}$ .

Proof. Let $S_1 = \{\chi _1,\ldots , \chi _k\}$ be the set of characters attaining the k largest values of $|\hat {f}|$ , let $S_2 = \widehat {Z}\setminus S_1$ , and let $c=\max \{|\hat {f}(\chi )|:\chi \in S_2\}$ . By definition, we have $|\hat {f}(\chi )|\geq c$ for all $\chi \in S_1$ .

We split the left-hand side of equation (3.2) into sums over $\chi \in S_1$ and $\chi \in S_2$ , then subtract the sum over $S_1$ to get

$$ \begin{align*}\sum_{\chi \in S_2} |\hat{f}(\chi)|^2 = \|f\|^2 - \sum_{\chi\in S_1} |\hat{f}(\chi)|^2.\end{align*} $$

Since $|\hat {f}(\chi )|\geq c$ for all $\chi \in S_1$ , the right-hand side is bounded above by $\|f\|^2 - kc^2$ . Since $c\leq |\hat {f}(\chi )|$ for at least one $\chi \in S_2$ , the left-hand side above is bounded below by $c^2$ . So

$$ \begin{align*} c^2\leq \sum_{\chi \in S_2} |\hat{f}(\chi)|^2 = \|f\|^2 - \sum_{\chi\in S_1} |\hat{f}(\chi)|^2 \leq 1-kc^2, \end{align*} $$

which implies $c^2\leq 1-kc^2$ . Solving, we get $c\leq (1+k)^{-1/2}$ . This means $|\hat {f}(\chi )|< k^{-1/2}$ for all $\chi \in S_2$ .

Remark 3.8. The exact form of the inequality in Lemma 3.7 is not important; we only need $\sup _{\chi \in \widehat {Z}\setminus \{\chi _1,\ldots ,\chi _k\}} |\hat {f}(\chi )|\leq c(k)$ , where $c(k)\to 0$ as $k\to \infty $ , uniformly for $\|f\|\leq 1$ .

Much of the proof of Lemma 3.5 is contained in Lemma 3.9. The actual application requires a technical generalization (Lemma 12.2).

Lemma 3.9. Fix $k<r\in \mathbb N$ , and let $U\subseteq \mathbb T^r$ be an approximate Hamming ball of radius $(k,\eta )$ with $\eta>0$ .

  1. (i) Let $\chi _1,\ldots ,\chi _k\in \widehat {\mathbb T}^r$ be non-trivial. Then there is a cylinder function g subordinate to U such that for all $s\in \mathbb T^r$ , we have

    $$ \begin{align*} \widehat{{g}_{s}}(\chi_j)=0 \quad \text{for each } j\leq k. \end{align*} $$
  2. (ii) If $f\in L^2(m_{\mathbb T^r})$ with $\|f\|\leq 1$ , there is a cylinder function g subordinate to U so that

    $$ \begin{align*} |\widehat{f*g}(\chi)|< k^{-1/2} \quad \text{for all } \chi\in\widehat{\mathbb T}^d. \end{align*} $$

Proof. (i) Assuming k, r, $\chi _j$ , and U are as in the statement, we may write $\chi _j$ as

(3.3) $$ \begin{align} \chi_j(x_1,\ldots,x_r)=e\bigg( \sum_{l=1}^{r} n^{(j)}_lx_l\bigg), \end{align} $$

where $e(t):=\exp (2\pi i t)$ and $n^{(j)}_l\in \mathbb Z$ . Non-triviality of $\chi _j$ means that for each j, at least one of the $n^{(j)}_l$ is non-zero. So choose one such index $l_j$ for each $j\leq k$ and let $I=\{1,\ldots ,r\}\setminus \{l_1,\ldots ,l_k\}$ . In case some of the $l_j$ repeat, remove additional elements from I so that $|I|=r-k$ .

Writing U as $\operatorname {Hamm}(\mathbf y;k,\eta )$ , let $V = V_{I,\mathbf y,\eta }=\{\mathbf x\in \mathbb T^d:\|y_l-x_l\|<\eta \text { for all } l \in I\}$ , so that $V\subseteq U$ . Let $g:={m(V)}^{-1}1_V$ , so that g is a cylinder function subordinate to U, and let $j\leq k$ . To prove that $\hat {g}(\chi _j)=0$ , note that g does not depend on any of the coordinates $x_{l_j}$ , so we can simplify the right-hand side of equation (3.3) as $e(\sum _{\substack {l=1 \\ l\neq l_j}}^{r} n_{l}^{(j)}x_l)e(n_{l_j}^{(j)} x_{l_j})$ and write $\hat {g}(\chi _j)=\int g \overline {\chi }_j \, dm$ as

$$ \begin{align*} \int_{\mathbb T^{r-1}} g(x_1,\ldots,x_r) e\bigg(-\sum_{\substack{l=1 \\ l\neq l_j}}^{r} n_{l}^{(j)}x_l\bigg)\, dx_1\ldots \, dx_{l_{j-1}}\, dx_{l_{j+1}}\, \ldots \, dx_{r} \, \int_{\mathbb T} e(-n_{l_j}^{(j)} x_{l_j})\, dx_{l_j}. \end{align*} $$

Since $\int e(-n_{l_j}^{(j)} x_{l_j})\, dx_{l_j}=0$ , we conclude that $\hat {g}(\chi _j)=0$ for each j. To complete the proof of part (i), we observe that $\widehat {g_{s}}(\chi )=\chi (s)\hat {g}(\chi )$ for each $\chi $ .

To prove part (ii), assume $f:\mathbb T^r\to \mathbb C$ has $\|f\|\leq 1$ , and let $|\hat {f}(\chi _1)|, \ldots , |\hat {f}(\chi _k)|$ be the k largest values of $|\hat {f}|$ . By part (i), choose a cylinder function g subordinate to U so that $\hat {g}(\chi _j)=0$ for these $\chi _j$ . Then $|\hat {f}(\chi )|< k^{-1/2}$ for all other $\chi $ , by Lemma 3.7. Note that $|\hat {g}(\chi )|\leq 1$ for all $\chi \in \widehat {\mathbb T}^d$ , since $\int |g|\, dm =1$ . We therefore have $\widehat {f*g}(\chi _j)=\hat {f}(\chi _j)\hat {g}(\chi _j)=0$ for $j=1,\ldots ,k$ , while $|\widehat {f*g}(\chi )|\leq |\hat {f}(\chi )|<k^{-1/2}$ for all other $\chi $ .

4 Multiple ergodic averages

Some of our reductions use facts from the general theory of nilsystems, mainly contained in [Reference Frantzikinakis6, Reference Frantzikinakis and Kra9]. Readers who want a general introduction to the theory can consult [Reference Host and Kra17].

If $(X,\mathcal B,\mu ,T)$ is an MPS and f is a bounded function on X, let

$$ \begin{align*} L_3(f,T):=\lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int f\cdot T^n f\cdot T^{2n}f\, d\mu. \end{align*} $$

The existence of this limit was established in [Reference Furstenberg11, §Reference Bergelson, Host, McCutcheon and Parreau3].

In this section, we prove Lemma 3.5 using Lemma 4.4, which estimates variants of $L_3(f,T)$ . In §5.1, we state a more convenient form of Lemma 4.4 and outline its proof.

We will use the following known result, which follows by combining a special case of [Reference Bergelson, Host, McCutcheon and Parreau3, Theorem 2.1] with the multidimensional Szemerédi theorem [Reference Furstenberg and Katznelson13].

Theorem 4.1. For all $\delta>0$ , there exists $c(\delta )>0$ such that for every MPS $(X,\mathcal B,\mu ,T)$ and every $f:X\to [0,1]$ with $\int f\, d\mu> \delta $ , we have

(4.1) $$ \begin{align} L_3(f,T)> c(\delta). \end{align} $$

Definition 4.2. We say that $\mathbf X=(X,\mathcal B,\mu ,T)$ is ergodic if $\mu (A\triangle T^{-1}A)=0$ implies $\mu (A)=0$ or $\mu (A)=1$ for every $A\in \mathcal B$ . We say that $\mathbf X$ is totally ergodic if for every $m\in \mathbb N$ , the system $(X,\mathcal B,\mu ,T^m)$ is ergodic.

Remark 4.3. When determining whether a set is a set of k-recurrence, we may restrict our attention to ergodic MPSs where $\mu $ is a regular Borel measure on a compact metric space X; cf. [Reference Einsiedler and Ward4, §§7.2.2 and 7.2.3].

When we say a sequence $(b_n)_{n\in \mathbb N}$ of natural numbers has linear growth, we mean that it is strictly increasing and $\limsup _{n\to \infty } b_n/n < \infty $ . Note that a strictly increasing sequence has linear growth if and only if the set of terms $B=\{b_n:n\in \mathbb N\}$ satisfies . Enumerating the positive elements of $\sqrt {BH}$ in increasing order, where $BH$ is a proper Bohr–Hamming ball always results in a sequence of linear growth. To see this, write $\sqrt {BH}$ as $\{n\in \mathbb Z:n^2\boldsymbol {\beta }\in U\}$ for some approximate Hamming ball $U\subseteq \mathbb T^r$ and generator $\boldsymbol {\beta }\in \mathbb T^r$ . Then,

$$ \begin{align*}\lim_{N\to\infty}\frac{|\sqrt{BH} \cap [1,\ldots N]|}{N} = \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N 1_U(n^2\boldsymbol{\beta}) = m(U),\end{align*} $$

by Weyl’s theorem on uniform distribution of polynomials (see Lemma 10.3). Since $n = |\sqrt {BH}\cap [1,\ldots ,b_n]|$ , this implies $b_n/n$ is bounded. Likewise, if g is a cylinder function subordinate to U (Definition 3.6), then enumerating $\{n\in \mathbb N: g(n^2\boldsymbol {\beta })>0\}$ in increasing order results in a sequence of linear growth.

The next lemma says that $L_3(f,T)$ can be approximated by averaging over elements of $\sqrt {BH}$ , provided $\mathbf X$ is totally ergodic and $BH$ is a proper Bohr–Hamming ball of radius $(k,\eta )$ with k sufficiently large. In passing to the general case, we need to consider $\sqrt {BH}/\ell :=\{n\in \mathbb Z: \ell n \in \sqrt {BH}\}$ .

Lemma 4.4. For all $\varepsilon>0$ , there is a $k\in \mathbb N$ such that for every totally ergodic MPS $(X,\mathcal B,\mu ,T)$ , every $f:X\to [0,1]$ , every proper Bohr–Hamming ball $BH$ of radius $(k,\eta )$ ( $\eta>0$ ), and all $\ell \in \mathbb N$ , there is a sequence $b_n\in \sqrt {BH}/\ell $ having linear growth such that

(4.2) $$ \begin{align} \lim_{N\to \infty} \bigg|\frac{1}{N}\sum_{n=1}^N \int f\cdot T^{b_n}f \cdot T^{2b_n}f\, d\mu - L_3(f,T)\bigg|<\varepsilon \|f\|^2. \end{align} $$

Consequently, if $\int f\, d\mu>\delta $ and k is sufficiently large (depending only on $\delta $ ), we have

(4.3) $$ \begin{align} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int f\cdot T^{b_n}f \cdot T^{2b_n}f\, d\mu> c(\delta)/2, \end{align} $$

where $c(\delta )$ is defined in Theorem 4.1.

Lemma 5.1 is a convenient reformulation of Lemma 4.4. In §5.1, we outline its proof, which occupies the majority of this article.

Remark 4.5. We do not know whether the condition ‘totally ergodic’ can be replaced with ‘ergodic’ in Lemma 4.4. The main obstruction to this replacement is our lack of a convenient representation of ergodic, but not totally ergodic, 2-step affine nilsystems.

4.1 Factors and extensions

If $\mathbf X = (X,\mathcal B,\mu ,T)$ and $\mathbf Y = (Y,\mathcal D,\nu ,S)$ are MPSs, we say that $\mathbf Y$ is a factor of $\mathbf X$ if there is a measurable $\pi : X\to Y$ intertwining S and T, meaning

$$ \begin{align*} \pi(Tx) = S\pi(x) \quad \text{for } \mu\text{-almost every (a.e.)} x\in X, \end{align*} $$

and $\mu (\pi ^{-1}D) = \nu (D)$ for all $D\in \mathcal D$ . Strictly speaking, the factor is the pair $(\pi , \mathbf Y)$ , and we refer to ‘the factor $\pi :\mathbf X\to \mathbf Y$ ’.

If $\pi :\mathbf X\to \mathbf Y$ is a factor and $f\in L^2(\mu )$ is equal $\mu $ -almost everywhere to a function of the form $g\circ \pi $ , with $g\in L^2(\nu )$ , we say that f is $\mathbf Y$ -measurable. This is equivalent to saying that f is $\pi ^{-1}(\mathcal D)$ -measurable (modulo $\mu $ ). We denote by $P_{\mathbf Y}$ the orthogonal projection from $L^2(\mu )$ to the space of $\pi ^{-1}(\mathcal D)$ -measurable functions. Given $f\in L^2(\mu )$ , we identify $P_{\mathbf Y}f$ with $\tilde {f}\in L^2(\nu )$ satisfying $P_{\mathbf Y} f = \tilde {f}\circ \pi $ .

We repeatedly use, without comment, the fact that $P_{\mathbf Y}$ is a positive operator preserving integration with respect to $\mu $ . In other words, if $f(x)\geq 0$ for $\mu $ -a.e. x, then $P_{\mathbf Y}f(x)\geq 0$ for $\mu $ -a.e. x, and $\int f\, d\mu = \int P_{\mathbf Y} f\, d\mu $ . Consequently, $\sup f \geq \tilde {f}(y)\geq \inf f$ for $\nu $ -a.e. y and $\int \tilde {f}\, d\nu = \int f\, d\mu $ .

Remark 4.6. When $\pi : \mathbf X\to \mathbf Y$ is a factor, we say that $\mathbf X$ is an extension of $\mathbf Y$ . If we wish to prove an inequality on ergodic averages for a system $\mathbf Y$ , it suffices to prove that inequality for an extension $\pi :\mathbf X\to \mathbf Y$ , since the integrals $\int f_0\cdot S^af_1\cdot S^{b}f_2\, d\nu $ can be written as $\int h_0\cdot T^a h_1\cdot T^{b}h_2\, d\mu $ , where $h_i = f_i\circ \pi $ . This observation will be used in §14.

4.2 Reducing to total ergodicity

The next lemma is used to deduce Lemma 3.5 from Lemma 4.4 and Theorem 4.1. Part (i) is a special case of [Reference Bergelson, Host and Kra2, Corollary 4.6], and part (ii) is an immediate consequence of part (i). Here ‘ $\mathbf Y$ is an inverse limit of ergodic nilsystems’ means that for all $f\in L^\infty (\nu )$ and $\varepsilon>0$ , there is a factor $\pi :\mathbf Y\to \mathbf Z$ , where $\mathbf Z=(Z,\mathcal Z,\eta ,R)$ is an ergodic nilsystem and $\|f-P_{\mathbf Z}f\|_{L^1(\nu )}<\varepsilon $ .

Lemma 4.7. Let $\mathbf X=(X,\mathcal B,\mu ,T)$ be an ergodic measure-preserving system. There is a factor $\pi :\mathbf X\to \mathbf Y=(Y,\mathcal D,\nu ,S)$ which is an inverse limit of ergodic nilsystems such that:

  1. (i) for all $f_i\in L^\infty (\mu )$ , letting $\tilde {f}_i\circ \pi =P_{\mathbf Y}f_i$ , we have

    $$ \begin{align*} \lim_{N\to \infty} \frac{1}{N}\sum_{n=1}^N \bigg| \int f_0\cdot T^n f_1\cdot T^{2n}f_2 \, d\mu - \int \tilde{f}_0\cdot S^n \tilde{f}_1\cdot S^{2n}\tilde f_2\, d\nu\bigg|=0; \end{align*} $$
  2. (ii) if $(b_n)_{n\in \mathbb N}$ is a sequence of linear growth, then

    $$ \begin{align*} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \bigg|\int f_0\cdot T^{b_n}f_1 \cdot T^{2b_n}f_2 \, d\mu - \int \tilde{f}_0\cdot S^{b_n}\tilde{f}_1 \cdot S^{2b_n}\tilde{f}_2\, d\nu\bigg|=0. \end{align*} $$

To derive part (ii) from part (i), note that

$$ \begin{align*} &\frac{1}{N}\sum_{n=1}^N \bigg|\int f\cdot T^{b_n}f \cdot T^{2b_n}f \, d\mu - \int \tilde{f}\cdot S^{b_n}\tilde{f} \cdot S^{2b_n}\tilde{f}\, d\nu \bigg| \\ &\quad\leq \frac{b_N}{N} \cdot \frac{1}{b_N}\sum_{n=1}^{b_N}\bigg| \int f_0\cdot T^n f_1\cdot T^{2n}f_2 \, d\mu - \int \tilde{f}_0\cdot S^n \tilde{f}_1\cdot S^{2n}\tilde f_2\, d\nu\bigg|\\ &\quad\underset{N\to\infty}{\longrightarrow} 0, \end{align*} $$

since ${b_N}/{N}$ is bounded.

We get the next result by combining the definition of ‘inverse limit’ with the fact that for every ergodic nilsystem $(Y,\mathcal D,\nu ,S)$ , there is an $\ell \in \mathbb N$ such that the ergodic components of $(Y,\mathcal D,\nu ,S^\ell )$ are totally ergodic; see [Reference Frantzikinakis6, Proposition 2.1] for justification.

Lemma 4.8. If $(X,\mathcal B,\mu ,T)$ is an inverse limit of ergodic nilsystems, $f:X\to [0,1]$ , and $\varepsilon>0$ , there is a factor $\mathbf Y = (Y,\mathcal D,\nu ,S)$ and $\ell \in \mathbb N$ such that:

  1. (i) $\|f-P_{\mathbf Y}f\|_{L^1(\mu )}<\varepsilon $ ;

  2. (ii) the ergodic components of $(Y,\mathcal D,\nu ,S^\ell )$ are totally ergodic.

Notation 4.9. When Y is the phase space of an ergodic nilsystem where $(Y,\mathcal D,\nu ,S^\ell )$ is totally ergodic, we will enumerate its connected components as $Y_1,\ldots ,Y_M$ , and write $\nu _i:=({1}/{M})\nu |_{Y_i}$ . Each $\mathbf Y_i:=(Y_i,\mathcal D_i,\nu _i,S^{\ell })$ is an ergodic component of $(Y,\mathcal D,\nu _Y,S^\ell )$ . If $\mathbf X$ is an extension of $\mathbf Y$ with factor map $\pi :X\to Y$ , we let $X_i=\pi ^{-1}(Y_i)$ , $\mathbf \mu _i:= ({1}/{M})\mu |_{X_i}$ , $\mathcal B_i:=\{B\cap X_i:B\in \mathcal B\}$ , and $\mathbf X_i=(X_i,\mathcal B_i,\mu _i,T^{\ell })$ . It is easy to verify that $\mathbf Y_i$ is a factor of $\mathbf X_i$ with factor map $\pi |_{X_i}$ .

Remark 4.10. Here we identify a technical difficulty common in multiple recurrence arguments. Readers familiar with the use of Markov’s inequality to overcome this difficulty may skip to the proof of Lemma 3.5.

Our proof of Lemma 3.5 starts with an ergodic, but not totally ergodic, MPS $\mathbf X=(X,\mathcal B,\mu ,T)$ . By Lemma 4.7, it suffices to prove the lemma in the special case where $\mathbf X$ is an inverse limit of ergodic nilsystems, so we assume $\mathbf X$ is such an inverse limit. We then consider $f:X\to [0,1]$ with $\int f\, d\mu>\delta $ . The goal is to find an $\ell \in \mathbb N$ and a sequence $(b_n)$ of elements of $\sqrt {BH}/\ell $ satisfying equation (4.7). The main difficulty arises when trying to exploit the structure of nilsystems: Lemma 4.4 requires total ergodicity, so we fix $\varepsilon>0$ and choose a factor $\pi :\mathbf X\to \mathbf Y$ where $\mathbf Y$ is an ergodic nilsystem satisfying parts (i) and (ii) in Lemma 4.8. We choose $\ell $ so that the ergodic components of $(Y,\mathcal D,\nu ,S^\ell )$ are totally ergodic, and we enumerate these components as $\mathbf Y_i = (Y_i, \mathcal D_i, \nu _i,S^\ell )$ , ${i=1,\ldots ,M}$ . With Notation 4.9 defined above, let $\tilde {f}\circ \pi =P_{\mathbf Y}f$ and $\tilde {f}_i=\tilde {f}|_{Y_i}$ . Lemma 4.4 allows us to choose, for each ergodic component $\mathbf Y_i$ where $\int \tilde {f}_i\, d\nu _i>\delta /2$ , a sequence $b_n^{(i)}\in \sqrt {BH}/\ell $ having linear growth, such that

(4.4) $$ \begin{align}\lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int \tilde{f}_i\cdot S^{\ell b_n^{(i)}} \tilde{f}_i \cdot S^{2\ell b_n^{(i)}}\tilde{f}_i\, d\nu_i>c(\delta/2)/2. \end{align} $$

The choice of $b_{n}^{(i)}$ depends on $\mathbf Y_i$ , so equation (4.4) implies only that

(4.5) $$ \begin{align} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int \tilde{f}\cdot S^{\ell b_n^{(i)}} \tilde{f} \cdot S^{2\ell b_n^{(i)}}\tilde{f}\, d\nu> \frac{1}{M}\frac{c(\delta/2)}{2}. \end{align} $$

If M is large, then $\|f-P_{\mathbf Y}f\|_{L^1(\mu )}$ may be large compared with $({1}/{M}){c(\delta /2)}/{2}$ , and equation (4.5) will not immediately imply equation (4.7). To overcome this obstacle, we want to find an i where equation (4.4) holds and $({1}/{M})\|f_i-P_{\mathbf Y}f_i\|_{L^1(\mu )}$ is sufficiently small to make $\int \tilde {f}_i\cdot S^{\ell a}\tilde {f}_i \cdot S^{\ell b}\tilde {f}_i\, d\nu _i$ close to $\int f_i\cdot T^{\ell a}f_i\cdot T^{\ell b} f_i\, d\mu _i$ for all $a, b$ . Such an i is provided by two straightforward applications of Markov’s inequality outlined in §15.3.

Before proving Lemma 3.5, we recall its statement: for all $\delta>0$ , there is a $k_0\in \mathbb N$ such that for every proper Bohr–Hamming ball $BH:=BH(\boldsymbol {\beta },\boldsymbol {y}; k, \eta )$ with $k\geq k_0$ , $\eta>0$ , and $\mathbf y\in \mathbb T^r$ , $\sqrt {BH}$ is $(\delta ,2)$ -recurrent.

Proof of Lemma 3.5, assuming Lemma 4.4

Let $\delta>0$ and choose $k_0\in \mathbb N$ so that for all $k\geq k_0$ , the inequality in equation (4.3) holds in Lemma 4.4 with $c(\delta /2)$ in place of $c(\delta )$ . Let $BH$ be a proper Bohr–Hamming ball with radius $(k,\eta )$ for some $\eta>0$ . It suffices to prove that for every MPS $(X,\mathcal B,\mu ,T)$ with $A\subseteq X$ having $\mu (A)>\delta $ ,

(4.6) $$ \begin{align} \mu(A\cap T^{-n}A\cap T^{-2n}A)> 0 \quad \text{for some } n\in \sqrt{BH}. \end{align} $$

By Remark 4.3, we need only consider ergodic MPSs. We will prove that if $\mathbf X$ is ergodic and $f:X\to [0,1]$ has $\int f\, d\mu>\delta $ , then there is a sequence of elements $b_n\in \sqrt {BH}$ with linear growth such that

(4.7) $$ \begin{align} \liminf_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int f\cdot T^{b_n}f\cdot T^{2b_n}f\, d\mu>0. \end{align} $$

The special case of equation (4.7) where $f=1_A$ implies equation (4.6), as the integral then simplifies to $\mu (A\cap T^{-b_n}A\cap T^{-2b_n}A)$ .

By part (ii) of Lemma 4.7, it suffices to prove equation (4.7) when $\mathbf X=(X,\mathcal B,\mu ,T)$ is an inverse limit of ergodic nilsystems. We now fix such an $\mathbf X$ , and $f:X\to [0,1]$ with $\int f\, d\mu>\delta $ .

Let $\varepsilon = ({\delta }/{24})c({\delta }/{2})$ , and let $\pi :\mathbf X\to \mathbf Y$ be the factor provided by Lemma 4.8 for this $\varepsilon $ , with $\ell \in \mathbb N$ chosen so that the ergodic components $\mathbf Y_i$ of $(Y,\mathcal D,\nu ,S^{\ell })$ are totally ergodic. Let M be the number of ergodic components (we can take $\ell = M$ , but we do not need this fact) so that $\mu (Y_i)=1/M$ for each i.

Let $X_i=\pi ^{-1}(Y_i)$ and let $f_i=1_{X_i}f$ , so that the $X_i$ partition X into sets of measure $1/M$ , and $\sum _i \int f_i \, d\mu = \int f \, d\mu>\delta $ . Observe that $P_{\mathbf Y}f_i$ is supported on $X_i$ and $\int P_{\mathbf Y}f_i\, d\mu = \int f_i\, d\mu $ for each i.

Setting $\mathbf Y_{i}:=(Y_i,\mathcal D_i,\nu _i,S^{\ell })$ , where $\nu _i:=M\nu |_{Y_i}$ , we get that $\mathbf{Y}_{i}$ is a totally ergodic MPS. Likewise, $\mathbf X_{i}:= (X_i, \mathcal B_i, \mu _i,T^{\ell })$ , with $\mu _i:=M\mu |_{X_i}$ is an MPS (possibly not ergodic), with $\pi |_{X_i}:X_i\to Y_i$ a factor map. To prove equation (4.7), we will find a sequence $b_n$ of elements of $\sqrt {BH}/\ell $ having linear growth and $i\leq M$ with

(4.8) $$ \begin{align} \liminf_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int f_i\cdot T^{\ell b_n}f_i\cdot T^{2\ell b_n} f_i\, d\mu> 0. \end{align} $$

We claim that there is an i such that

(4.9) $$ \begin{align} \int f_i\, d\mu &> \frac{\delta}{2M} \quad\text{and } \end{align} $$
(4.10) $$ \begin{align} \|f_i - P_{\mathbf Y}f_i\|_{L^1(\mu)}&<\frac{c(\delta/2)}{12M}. \end{align} $$

This i is provided by Lemmas 15.5 and 15.6: setting

$$ \begin{align*}I:=\bigg\{i:\int f_i\, d\mu>\frac{\delta}{2M}\bigg\}, \quad J:=\bigg\{i: \int |f_i - P_{\mathbf Y}f_i|\, d\mu < \frac{c(\delta/2)}{12M}\bigg\},\end{align*} $$

we get $|I|> M\delta /2$ and $|J| > M(1-12\varepsilon/c(\delta/2)) = M(1-\delta/2)$ . Thus $|I|+|J|>M$ , implying $I\cap J$ is non-empty.

Fix i satisfying inequalities (4.9) and (4.10). Note that inequality (4.9) and the definition of $\nu _i$ , $\mu _i$ , and $\tilde {f}_i$ imply

(4.11) $$ \begin{align} \int \tilde{f}_i\, d\nu_i> \delta/2. \end{align} $$

Since $(Y_i,\mathcal B_i,\nu _i,S^{\ell })$ is totally ergodic, we may apply Lemma 4.4 to choose a sequence of elements $b_n\in \sqrt {BH}/\ell $ having linear growth and satisfying

(4.12) $$ \begin{align} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int \tilde{f}_i\cdot S^{\ell b_n}\tilde{f}_i \cdot S^{2\ell b_n} \tilde{f}_i\, d\nu_i> c(\delta/2)/2. \end{align} $$

Inequality (4.10), the bounds $\|f_i\|_{\infty }\leq 1$ , $\|P_{\mathbf Y}f_{i}\|_{\infty }\leq 1$ , and Lemma 15.7 imply

(4.13) $$ \begin{align} \bigg|\int f_i\cdot T^{\ell a} f_i \cdot T^{\ell b} f_i\, d\mu_i - \int P_{\mathbf Y_i}f_i\cdot T^{\ell a} P_{\mathbf Y_i}f_i \cdot T^{\ell b} P_{\mathbf Y_i}f_i\, d\mu_i\bigg|< \frac{1}{4}c(\delta/2) \end{align} $$

for each $a,b\in \mathbb N$ . Recalling the definition of $\mu _i$ and $\nu _i$ , we see that for all sufficiently large N,

$$ \begin{align*} &\frac{1}{N} \sum_{n=1}^{N} \int f_i\cdot T^{\ell b_n} f_i \cdot T^{2\ell b_n} f_i\, d\mu\\ &\quad> \frac{1}{N}\sum_{n=1}^{N} \int \tilde{f}_i \cdot S^{\ell b_n} f_i \cdot S^{2\ell b_n} f_i\, d\nu - \frac{c(\delta/2)}{4M} \quad \text{by inequality}\ ({4.13})\\ &\quad> \frac{c(\delta/2)}{2M} - \frac{c(\delta/2)}{4M} \qquad\qquad\qquad\qquad\qquad\quad \ \text{ by inequality}\ ({4.12}) \\ &\quad= \frac{c(\delta/2)}{4M}. \end{align*} $$

The above inequalities imply equation (4.8). Since $f\geq f_i$ pointwise and we chose $b_n\in \sqrt {BH}/\ell $ , this implies equation (4.7) and completes the proof of Lemma 3.5.

5 Reformulation of Lemma 4.4

5.1 Reformulation

Lemma 4.4 is an immediate consequence of the following reformulation. This version allows us to apply the theory of characteristic factors.

Lemma 5.1. Let $k<r\in \mathbb N$ , $\ell \in \mathbb N$ , let $\boldsymbol {\beta }\in \mathbb T^r$ be generating, and let $U\subseteq \mathbb T^r$ be an approximate Hamming ball of radius $(k,\eta )$ for some $\eta>0$ . For every totally ergodic MPS $(X,\mathcal B,\mu ,T)$ , and every measurable $f:X\to [0,1]$ , there is a cylinder function $g={m(V)}^{-1}1_V$ subordinate to U such that

(5.1) $$ \begin{align} \lim_{N\to \infty} \bigg|\frac{1}{N}\sum_{n=1}^N g(n^2\ell^2\boldsymbol{\beta})\int f\cdot T^{n} f \cdot T^{2n} f\, d\mu - L_3(f,T)\bigg|<2k^{-1/2}\|f\|^2. \end{align} $$

While U does not depend on f in Lemma 5.1, the choice of g to satisfy equation (5.1) does depend on f.

We prove Lemma 5.1 in §14. The derivation of Lemma 4.4 from Lemma 5.1 is an instance of the following general principle: if $a_n$ is a bounded sequence, $B\subseteq \mathbb N$ is enumerated as $\{b_1<b_2<\ldots \}$ , and $d(B):=\lim _{N\to \infty } ({|B\cap \{1,\ldots ,N\}|}/{N})>0$ , then

$$ \begin{align*} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N a_{b_n} = \lim_{N\to\infty} \frac{1}{Nd(B)}\sum_{n=1}^N1_{B}(n)a_n \end{align*} $$

provided the limit on the right exists. Note that $(b_n)_{n\in \mathbb N}$ has linear growth if $d(B)>0$ .

We will apply this principle with $a_n = \int f\cdot T^{n} f \cdot T^{2n} f\, d\mu $ and ${B=\{n:n^2\ell ^2\boldsymbol {\beta }\in V\}}$ , where V is a cylinder contained in U. Then, $g={m(V)}^{-1}1_V$ is a cylinder function subordinate to U, and $g(n^2\ell ^2\boldsymbol {\beta })={d(B)}^{-1}1_B(n)$ . The equation $d(B)=m(V)$ follows from Weyl’s theorem on uniform distribution (cf. §10). Note that this B is contained in $\sqrt {BH}/\ell $ , where $BH$ is the Bohr–Hamming ball corresponding to U, with frequency $\boldsymbol {\beta }$ .

Remark 5.2. The exact form of the bound in equation (5.1) is not important in the following. The only relevant property is that the coefficient of $\|f\|^2$ tends to $0$ as $k\to \infty $ .

5.2 Outline of a special case of Lemma 15.1

This outline highlights the key steps in our proof while avoiding some complications.

We begin with an arbitrary totally ergodic measure-preserving system $\mathbf X=(X,\mathcal B,\mu ,T)$ , $f: X\to [0,1]$ , and $k\in \mathbb N$ . We let $r>k$ , $\eta>0$ , and fix an approximate Hamming ball $U=\operatorname {Hamm}(\mathbf y;k,\eta )\subseteq \mathbb T^r$ and a generator $\boldsymbol {\beta } \in \mathbb T^r$ . We want to find a cylinder function g subordinate to U so that

(5.2) $$ \begin{align} A_N(f,g):= \frac{1}{N}\sum_{n=1}^N g(n^2\boldsymbol{\beta})\int f\cdot T^nf\cdot T^{2n}f\, d\mu \end{align} $$

satisfies $\lim _{N\to \infty } |A_N(f,g)-L_3(f,T)|<2k^{-1/2}\|f\|^2$ .

In §§68, we will reduce to the case where $\mathbf X$ is a standard 2-step Weyl system. This means that $(X,\mathcal B,\mu ,T)$ can be realized with $X=\mathbb T^d\times \mathbb T^d$ , $d\in \mathbb N$ , $\mu =$ Haar probability measure on $\mathbb T^d\times \mathbb T^d$ , and T is given by $T(x, y)=( x+\boldsymbol {\alpha },y+ x)$ , for some generator $\boldsymbol {\alpha }\in \mathbb T^d$ . The orbits of T can be computed explicitly: $T^n(x,y)=(x+n\boldsymbol {\alpha }, y+nx+\tbinom {n}{2}\boldsymbol {\alpha })$ . This reduction relies on the theory of characteristic factors, especially [Reference Frantzikinakis6, Theorem B].

To simplify this outline, we assume $r=d$ and $\boldsymbol {\beta } = \boldsymbol {\alpha }$ . We write functions on $\mathbb T^d\times \mathbb T^d$ with variables displayed as $f(x,y)$ , where $x, y\in \mathbb T^d$ . Writing $m\times m$ for Haar probability measure on $\mathbb T^d\times \mathbb T^d$ , we write $\int f\, dm\times m$ as $\int f(x,y)\, dx\, dy$ , or $\int f\binom {x}{y}\, dx\, dy$ to save space. With these assumptions, the averages in equation (5.2) become

$$ \begin{align*} B_N := \frac{1}{N}\sum_{n=1}^N g(n^2\boldsymbol{\alpha}) \int_{\mathbb T^d\times \mathbb T^d} f\,\Big(\begin{matrix} x\\[-2pt]y \end{matrix}\Big) \, f \begin{pmatrix} x+n\boldsymbol{\alpha}\\[-1pt]y+nx+\tbinom{n}{2}\boldsymbol{\alpha} \end{pmatrix} f\begin{pmatrix} x+2n\boldsymbol{\alpha} \\[-1pt] y+2nx + \tbinom{2n}{2}\boldsymbol{\alpha} \end{pmatrix} \, dx\, dy. \end{align*} $$

Proposition 13.1 provides an explicit formula for $\lim _{N\to \infty } B_N$ . Under the present assumptions, it says

(5.3) $$ \begin{align} &\lim_{N\to\infty} B_N\nonumber\\ &\quad= \int_{(\mathbb T^d)^4} f(x,y)f(x+s,y+t) \bigg(\int_{\mathbb T^d} f(x+2s,y+2t+2w)g(w) \, dw\bigg) \, ds\, dt \, dx\, dy. \end{align} $$

Write I for the right-hand side above, and define

$$ \begin{align*} f*_2 g(x,y):= \int f(x,y+2w) g(w)\, dw. \end{align*} $$

Using Lemma 12.2 (a generalization of Lemma 3.9), we choose a cylinder function g subordinate to U such that $|\widehat {f*_2 g}(\chi ,\psi )|<k^{-1/2}$ for all $(\chi ,\psi )\in \widehat {\mathbb T}^d\times \widehat {\mathbb T}^d$ with $\psi $ non-trivial. We set $f'(x)\kern1.2pt{:=}\kern1.2pt\int f(x,y)\, dy$ and $J'\kern1.2pt{:=}\kern1.2pt \int \int f'(x)f'(x\kern1.2pt{+}\kern1.2pt s)f'(x\kern1.2pt{+}\kern1.2pt 2s)\, dx\, ds$ . By Lemma 11.1, the bound on $\widehat {f*_2 g}$ will imply

(5.4) $$ \begin{align} |I - J'| < k^{-1/2}\|f\|^2. \end{align} $$

We can also prove (directly, or using Theorem 7.1) that $L_3(f,T)=J'$ . Combining equation (5.4) with equation (5.3), we then have equation (5.1), completing the outline of this special case. The factor $2$ on the right-hand side of equation (5.1) accounts for the reduction to Weyl systems.

In the general case, we must compute $\lim _{N\to \infty } A_N(f,g)$ for $d\neq r$ and $\boldsymbol {\beta }\neq \boldsymbol {\alpha }$ . The integral $\int f(x+2s,y+2t +2w) g(w)\, dw$ in equation (5.3) will then be replaced by an integral over an affine joining of $\mathbb T^d$ with $\mathbb T^r$ (Definition 9.4), but the computation in this case is not substantially different from the outline above.

5.3 Iterated integral notation

When all variables are displayed and there is no chance of confusion, we may omit all but one of the integral signs and the subscripts indicating the domain of integration. So the integral in equation (5.3) may be written as

$$ \begin{align*} \int f(x,y)f(x+s,y+t) f(x+2s,y+2t+2w)g(w) \, dw\, ds \, dt\, dx\, dy. \end{align*} $$

6 Eigenvalues and ergodicity of products

An eigenfunction of an MPS $\mathbf X=(X,\mathcal B,\mu ,T)$ with eigenvalue $\unicode{x3bb} \in \mathbb C$ is an $f\in L^2(\mu )$ satisfying $\|f\|\neq 0$ and $f\circ T=\unicode{x3bb} f$ . Since $\int |f\circ T|\, d\mu = \int |f|\, d\mu $ , we have $|\unicode{x3bb} |=1$ . We then have that $|f\circ T|$ is T-invariant, so if $\mathbf X$ is ergodic, we get that $|f|$ is equal $\mu $ -almost everywhere to a constant. We say an eigenvalue $\unicode{x3bb} $ of $\mathbf X$ is non-trivial if $\unicode{x3bb} \neq 1$ . Note that the eigenfunctions of $\mathbf X$ are the eigenvectors of the unitary operator $U_T:L^2(\mu )\to L^2(\mu )$ defined by $U_T f = f\circ T$ .

Given two MPSs $\mathbf X = (X,\mathcal B,\mu ,T)$ and $\mathbf Y = (Y,\mathcal D,\nu ,S)$ , we form the product system $\mathbf X\times \mathbf Y=(X\times Y, \mathcal B\otimes \mathcal D,\mu \times \nu , T\times S)$ . For $f\in L^2(\mu )$ and $g\in L^2(\nu )$ , we write $f\otimes g$ for the function defined by $f\otimes g(x,y)=f(x)g(y)$ .

We need some standard consequences of the following, which is the specialization of [Reference Furstenberg12, Lemma 4.17, p. 91] to the case where $\mathcal H=L^2(\mu )$ , $\mathcal H'=L^2(\nu )$ for MPSs $\mathbf X$ and $\mathbf Y$ as above, with unitary operators $Uf:=f\circ T$ and $U'g:=g\circ S$ .

Lemma 6.1. Let $\mathbf X$ and $\mathbf Y$ be measure-preserving systems as above, and let $\mathbf X\times \mathbf Y$ be the product system. Let $h\in L^2(\mu \times \nu )$ be an eigenfunction of $\mathbf X\times \mathbf Y$ with eigenvalue $\unicode{x3bb} $ , meaning $h\circ (T\times S)=\unicode{x3bb} h$ . Then $h = \sum c_n f_n\otimes g_n$ , where $f_n\circ T=\unicode{x3bb} _nf_n$ , $g_n\circ S = \unicode{x3bb} _n'g_n$ , $\unicode{x3bb} _n\unicode{x3bb} _n' = \unicode{x3bb} $ , and the sequences $\{f_n\}$ , $\{g_n\}$ are orthonormal in $L^2(\mu )$ and $L^2(\nu )$ , respectively.

To deduce Lemma 6.1 from of [Reference Furstenberg12, Lemma 4.17], note that if $\mu $ and $\nu $ are measure spaces, $L^2(\mu \times \nu )$ is isomorphic to the tensor product $L^2(\mu )\otimes L^2(\nu )$ , and the obvious isomorphism identifies $U_{T\times S}$ with $U_T\otimes U_S$ .

The next lemma is a well-known consequence of Lemma 6.1; we omit its proof.

Lemma 6.2. If $\mathbf X$ and $\mathbf Y$ are ergodic MPSs, the product system $\mathbf X\times \mathbf Y$ is ergodic if and only if $\mathbf X$ and $\mathbf Y$ have no non-trivial eigenvalues in common.

Another immediate consequence of Lemma 6.1 is the following lemma.

Lemma 6.3. If $\mathbf X=(X,\mathcal B,\mu ,T)$ and $\mathbf Y=(Y,\mathcal D,\nu ,S)$ are MPSs such that $\mathbf X\times \mathbf Y$ is ergodic and $g\in L^2(\nu )$ is orthogonal to every eigenfunction of $\mathbf Y$ , then for every $f\in L^2(\mu )$ , $f\otimes g$ is orthogonal to every eigenfunction of $\mathbf X\times \mathbf Y$ .

7 Eigenfunctions and the Kronecker factor

Every ergodic MPS $\mathbf X$ has a factor $\pi :\mathbf X\to \mathbf Z$ where $\mathbf Z=(Z,\mathcal Z,m,R)$ is a compact abelian group rotation such that every eigenfunction of $\mathbf X$ is $\pi ^{-1}(\mathcal Z)$ -measurable. This factor is called the Kronecker factor of $\mathbf X$ , and we write $\int _Z f(s)\, ds$ (or sometimes just $\int f(s)\,ds$ ) to abbreviate $\int f(s)\, dm(s)$ .

The following result is proved in [Reference Furstenberg11, §3]; we use the notation $L_3$ introduced in §4.

Theorem 7.1. If $\mathbf X=(X,\mathcal B,\mu ,T)$ is an ergodic MPS with Kronecker factor $\pi :\mathbf X\to \mathbf Z$ , $\mathbf Z=(Z,\mathcal Z,m,R)$ , and $f_i:X\to [0,1]$ , then

$$ \begin{align*}\lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N f_1(T^n x) f_2(T^{2n} x) = \int_Z \tilde{f}_1(\pi(x)+s)\tilde{f}_2(\pi(x)+2s)\, ds, \quad (\text{in} \ L^2(\mu)),\end{align*} $$

where $\tilde {f}_i\in L^\infty (m)$ satisfies $\tilde {f}_i\circ \pi =P_{\mathbf Z}f_i$ . Furthermore,

(7.1) $$ \begin{align} L_3(f,T) = \int_Z\int_Z \tilde{f}(z)\tilde{f}(z+s)\tilde{f}(z+2s)\, dz\, ds \quad \text{for all } f\in L^\infty(\mu). \end{align} $$

7.1 Kronecker factor of a standard 2-step Weyl system

A standard $2$ -step Weyl system is an MPS of the form $\mathbf Y = (Y, \mathcal B, m,S)$ , where $Y=\mathbb T^d\times \mathbb T^d$ , $d\in \mathbb N$ , and $S:Y\to Y$ is defined as $S(x,y)=(x+\alpha ,y+x)$ , for some fixed $\alpha =(\alpha _1,\ldots ,\alpha _d)$ generating $\mathbb T^d$ . There is an explicit formula for the orbits of S:

(7.2) $$ \begin{align} S^n(x,y)=(x+n\alpha, y + nx + \tbinom{n}{2}\alpha), \end{align} $$

which may be verified by induction. Ergodicity of $\mathbf Y$ is equivalent to $\mathbf \alpha $ generating $\mathbb T^d$ . For $d=1$ , this follows from [Reference Furstenberg12, Proposition 3.11, p. 67], and the general case follows from a nearly identical proof. Also explained in [Reference Furstenberg12] is the Kronecker factor of $\mathbf Y$ : the eigenfunctions of $\mathbf Y$ are exactly the functions $\chi $ on Y defined by

$$ \begin{align*} \chi((x_1,\ldots,x_d),(y_1,\ldots,y_d)):=\exp(2\pi i (n_1x_1+\cdots + n_dx_d)) \end{align*} $$

for some $n_j\in \mathbb Z$ , so the group of eigenvalues of $\mathbf Y$ is $\{\exp (2\pi i (n_1\alpha _1+\cdots +n_d\alpha _d)) : n_j\in ~\mathbb Z\}$ . Thus, the Kronecker factor of $\mathbf Y$ is obtained by setting $Z=\mathbb T^d$ and letting $\pi :\mathbb T^d\times \mathbb T^d \to \mathbb T^d$ be a projection onto the first coordinate. Since the span of the eigenfunctions of $\mathbf Y$ consists solely of those functions depending on the first coordinate, the orthogonal projection $P_{\mathbf Z}f(x,y)$ can be written as $(P_{\mathbf Z}f)(x,y):=\int f(x,y)\, dy$ . Combining this with Theorem 7.1, we have the following observation.

Observation 7.2. The Kronecker factor $(Z,\mathcal Z,m,R)$ of a standard 2-step Weyl system $(\mathbb T^d \times \mathbb T^d,\mathcal D,\mu ,S)$ is spanned by functions of the form $f(x,y)=g(x)$ (i.e. functions depending on only the first coordinate), and for all bounded $f:\mathbb T^d\times \mathbb T^d\to \mathbb C$ , we have

$$ \begin{align*} \lim_{N\to\infty} \frac{1}{N}\sum_{n=1}^N \int f\cdot S^n f\cdot S^{2n}f \, d\mu = \int_{\mathbb T^d}\int_{\mathbb T^d} f'(x)f'(x+s)f'(x+2s)\, dx\, ds, \end{align*} $$

where $f':\mathbb T^d\to \mathbb C$ is defined as $f'(x):=\int f(x,y)\, dy$ .

8 Reduction to Weyl systems

The next lemma is one key step in the proof of Lemma 5.1. Its proof is similar to the proof of [Reference Ackelsberg, Bergelson and Best1, Lemma 8.1].

Lemma 8.1. Let $\mathbf X = (X,\mathcal B,\mu ,T)$ be a totally ergodic MPS and $f:X\to [0,1]$ . For all $\varepsilon>0$ , there is a factor $\pi :\mathbf X\to \mathbf Y$ such that:

  1. (i) $\mathbf Y$ is a factor of a standard 2-step Weyl system;

  2. (ii) setting $\tilde {f}\circ \pi =P_{\mathbf Y}f$ , we have

    $$ \begin{align*}\lim_{N\to\infty} \bigg|\frac{1}{N}\sum_{n=1}^N g(n^2 \boldsymbol{\beta})\int f \cdot T^{n}f\cdot T^{2n}f\, d\mu - g(n^2\boldsymbol{\beta}) \int \tilde{f}\cdot S^{n}\tilde{f}\cdot S^{2n}\tilde{f}\, d\nu\bigg|<\varepsilon\end{align*} $$
    for every continuous $g:\mathbb T^r\to [0,1]$ and every $\boldsymbol {\beta }\in \mathbb T^r$ , for all $r\in \mathbb N$ .

If we assume $\boldsymbol {\beta }$ generates $\mathbb T^r$ , then item (ii) holds for every Riemann integrable $g:\mathbb T^r\to [0,1]$ .

We prove Lemma 8.1 at the end of this section. Most of the proof is contained in the next lemma, an application of [Reference Frantzikinakis6, Theorem B]. It concerns the maximal $2$ -step affine factor $\mathbf A_2$ of an ergodic MPS $\mathbf X$ ; see [Reference Frantzikinakis6] for discussion and exposition. Additionally, we use the standard fact that the Kronecker factor of $\mathbf X$ is a factor of $\mathbf A_2$ .

If $\mathbf X$ is an MPS, we write $\mathcal E(\mathbf X)$ for the group of eigenvalues of $\mathbf X$ (see §6). We continue to write $e(t)$ for $\exp (2\pi i t)$ , and we use the notation $P_{\mathbf Y}$ introduced in §4.1.

Lemma 8.2. Let $\mathbf X=(X,\mathcal B,\mu ,T)$ be an ergodic measure-preserving system with maximal 2-step affine factor $\mathbf A_2$ and let $\beta \in [0,1)$ . Then $\mathbf A_2$ is characteristic for the averages

(8.1) $$ \begin{align} B_N(f_1,f_2):=\frac{1}{N}\sum_{n=1}^N e(n^2\beta) \cdot T^n f_1 \cdot T^{2n}f_2, \end{align} $$


(8.2) $$ \begin{align} \lim_{N\to\infty} B_N(f_1,f_2) = \lim_{N\to \infty} B_N(P_{\mathbf A_2}f_1, P_{\mathbf A_2} f_2) \end{align} $$

in $L^2(\mu )$