Hostname: page-component-7d684dbfc8-hsbzg Total loading time: 0 Render date: 2023-09-27T19:20:37.860Z Has data issue: false Feature Flags: { "corePageComponentGetUserInfoFromSharedSession": true, "coreDisableEcommerce": false, "coreDisableSocialShare": false, "coreDisableEcommerceForArticlePurchase": false, "coreDisableEcommerceForBookPurchase": false, "coreDisableEcommerceForElementPurchase": false, "coreUseNewShare": true, "useRatesEcommerce": true } hasContentIssue false

Decomposition of multicorrelation sequences and joint ergodicity

Published online by Cambridge University Press:  04 May 2023

Departamento de Ingeniería Matemática and Centro de Modelamiento Matemático, Universidad de Chile & IRL 2807 - CNRS, Beauchef 851, Santiago, Chile (e-mail:
Department of Mathematics, The Ohio State University, 231 West 18th Avenue, Columbus, OH 43210-1174, USA
Department of Mathematics, Aristotle University of Thessaloniki, Thessaloniki 54124, Greece (e-mail:
Department of Mathematics, Virginia Tech, 225 Stanger Street, Blacksburg, VA 24061, USA (e-mail:
Rights & Permissions [Opens in a new window]


We show that, under finitely many ergodicity assumptions, any multicorrelation sequence defined by invertible measure-preserving $\mathbb {Z}^d$ -actions with multivariable integer polynomial iterates is the sum of a nilsequence and a nullsequence, extending a recent result of the second author. To this end, we develop a new seminorm bound estimate for multiple averages by improving the results in a previous work of the first, third, and fourth authors. We also use this approach to obtain new criteria for joint ergodicity of multiple averages with multivariable polynomial iterates on ${\mathbb Z}^{d}$ -systems.

Original Article
Creative Commons
Creative Common License - CCCreative Common License - BY
This is an Open Access article, distributed under the terms of the Creative Commons Attribution licence (, which permits unrestricted re-use, distribution and reproduction, provided the original article is properly cited.
© The Author(s), 2023. Published by Cambridge University Press

1 Introduction

1.1 Decomposition of multicorrelation sequences

The structure and limiting behavior of (averages of) multicorrelation sequences, that is, sequences of the form

$$ \begin{align*} (n_{1},\ldots,n_{k})\mapsto \int_X f_0\cdot T_1^{n_1}f_1\cdots T_k^{n_k}f_k\,d\mu, \end{align*} $$

where $k\in \mathbb {N}, T_1,\ldots ,T_k\colon X\to X$ are invertible and commuting (that is, $T_iT_j=T_jT_i$ for all $i,j$ ) measure-preserving transformations on a probability space $(X,\mathcal {B},\mu )$ , $f_0,\ldots ,f_k\in L^\infty (\mu )$ and $n_1,\ldots ,n_k\in \mathbb {Z},$ is a central topic in ergodic theory. (We say that T preserves $\mu $ if $\mu (T^{-1}A)=\mu (A)$ for all $A\in \mathcal {B}.$ The tuple $(X,\mathcal {B},\mu ,T_1,\ldots ,T_k)$ is a (measure-preserving) system.) For $k=1$ , Herglotz–Bochner’s theorem implies that the sequence $\int _X f_0 \cdot T_1^nf_1 \, d\mu $ is given by the Fourier coefficients of some finite complex measure $\sigma $ on ${\mathbb T}$ (see [Reference Khintchine22, Reference Koopman and von Neumann23]). More specifically, decomposing $\sigma $ into the sum of its atomic part, $\sigma _a$ , and continuous part, $\sigma _c$ , we get

$$ \begin{align*} \int_X\! f_0 \cdot T_1^nf_1\, d\mu \!=\!\int_{{\mathbb T}} e^{2\pi i nx} \, d\sigma(x) \!=\!\int_{{\mathbb T}}\kern-1pt e^{2\pi i n x}\, d\sigma_a(x)+\kern-1.2pt\int_{{\mathbb T}} e^{2\pi i n x}\, d\sigma_c(x) \!=\! \psi(n)+\nu(n),\end{align*} $$

where $(\psi (n))$ is an almost periodic sequence (that is, there exists a compact abelian group G, a continuous function $\phi : G \rightarrow {\mathbb C}$ , and $a \in G$ such that $\psi (n)=\phi (a^n), n\in \mathbb {N}$ ) and $(\nu (n))$ is a nullsequence, that is,

(1) $$ \begin{align} \lim_{N-M \to \infty} \frac{1}{N-M} \sum_{n=M}^{N-1} |\nu(n)|^2=0. \end{align} $$

More generally, after Furstenberg’s celebrated ergodic theoretic proof of Szemerédi’s theorem [Reference Furstenberg15], for a single transformation T and iterates of the form $in, 1\leq i\leq k,$ there has been a particular interest in the study of the corresponding multicorrelation sequences

(2) $$ \begin{align} \alpha(n)=\int_X f_0\cdot T^nf_1\cdots T^{kn}f_k\,d\mu. \end{align} $$

For T ergodic (that is, every T-invariant set in $\mathcal {B}$ has trivial measure in $\{0,1\}$ ), Bergelson, Host, and Kra [Reference Bergelson, Host and Kra3] showed that the sequence $(\alpha (n))$ in equation (2) admits a decomposition of the form $a(n)=\phi (n)+\nu (n),$ where $(\phi (n))$ is a uniform limit of k-step nilsequences (see §3.1 for the definition) and $(\nu (n))$ satisfies equation (1). (Note that k is the number of linear iterates that appear in equation (2).) Leibman, in [Reference Leibman28] for ergodic systems and [Reference Leibman29] for general ones, extended the result of Bergelson, Host, and Kra to polynomial iterates, meaning that in equation (2), instead of $n,\ldots , kn,$ we have $p_1(n),\ldots ,p_k(n),$ for some $p_1,\ldots ,p_k\in \mathbb {Z}[x].$

For $d\in {\mathbb N}$ , we say that a tuple $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^d})$ is a ${\mathbb Z}^d$ -measure-preserving system (or a ${\mathbb Z}^d$ -system) if $(X,\mathcal {B},\mu )$ is a probability space and $T_{n}\colon X\to X, n \in {\mathbb Z}^d,$ are measure-preserving transformations on X such that $T_{(0,\ldots ,0)}=\mathrm {id}$ and $T_{m}\circ T_{n}=T_{m+n}$ for all $m,n\in {\mathbb Z}^{d}$ . Notice here that we use the notation $T_n$ to stress the fact that T is a $\mathbb {Z}^d$ -action. If T is generated by the $\mathbb {Z}$ -actions $T_1,\ldots ,T_d$ and $p_i=(p_{i,1},\ldots ,p_{i,d}),$ we have $T_{p_i(n)}=\prod _{j=1}^d T_j^{p_{i,j}(n)}.$ It is natural to ask whether splitting results still hold for systems with commuting transformations.

Question 1.1. [Reference Koutsogiannis, Le, Moreira and Richter27, Question 2]

Let $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^d})$ be a ${\mathbb Z}^d$ -system, $k\in {\mathbb N}$ , $p_{1},\ldots ,p_{k}\colon {\mathbb Z}\to {\mathbb Z}^{d}$ a family of polynomials, and $f_0,f_1,\ldots ,f_k \in L^{\infty }(\mu )$ . Under which conditions on the system can the multicorrelation sequence

(3) $$ \begin{align} \int_Xf_0\cdot T_{p_1(n)}f_1\cdots T_{p_k(n)}f_k\,d\mu \end{align} $$

be decomposed as the sum of a uniform limit of nilsequences and a nullsequence?

The extension of the aforementioned results from $\mathbb {Z}$ to $\mathbb {Z}^d$ -actions is, to this day, a challenging open problem. The main issue is that the proofs of the splitting theorems crucially depend on the theory of characteristic factors via the structure theory developed by Host and Kra [Reference Host and Kra18], a tool that is unavailable in the more general $\mathbb {Z}^d$ -setting. By this, we mean that while nilfactors for $\mathbb {Z}^d$ -analogs of Host–Kra uniformity norms are available (this can be found, for example, in [Reference Griesmer16]), it is in general not possible to relate averages such as equation (3) to those uniformity norms in the way one does for $d=1$ . As an aside, Frantzikinakis provided a partial answer to Question 1.1 (for $d=1$ ) in [Reference Frantzikinakis and Host10] that avoided the use of characteristic factors. The answer was partial in the sense that the nullsequence part was allowed to have an $\ell ^{2}({\mathbb Z})$ error term. A similar decomposition result for general d was proven by Frantzikinakis and Host in [Reference Frantzikinakis, Host and Kra12]. (The third author showed in [Reference Koutsogiannis25] the analog to this result for integer parts, or any combination of rounding functions, of real polynomial iterates. For a refinement of this result, with the average of the error term taken along primes, see [Reference Koutsogiannis, Le, Moreira and Richter27].) From the point of view of applications, it is useful to have such splitting results for studying weighted averages, in particular for multiple commuting transformations. (It is worth mentioning that the splitting of equation (2), where the average in the null term is taken along primes, was used by Tao and Teräväinen to show the logarithmic Chowla conjecture for products of odd factors [Reference Tao and Teräväinen32].)

It was demonstrated in [Reference Donoso, Koutsogiannis and Sun7] that under finitely many ergodicity assumptions (that is, we only have to assume that some iterates, coming from a finite set, of T are ergodic), the characteristic factors (defined in §2.3) for the corresponding averages

(4) $$ \begin{align} \frac{1}{N}\sum_{n=1}^{N}T_{p_{1}(n)}f_1\cdots T_{p_{k}(n)}f_k \end{align} $$

are, as in the case of ${\mathbb Z}$ -actions, rotations on nilmanifolds. (A similar result was obtained in [Reference Johnson20] under infinitely many ergodicity assumptions. Such multiple ergodic averages always have $L^2$ -limits as $N\to \infty $ [Reference Walsh34].) So, it is reasonable to expect that Question 1.1 holds after postulating finitely many ergodicity assumptions (this is an open problem even in the $k=2$ case—see [Reference Frantzikinakis, Host and Kra12]).

A partial answer toward this direction was obtained in [Reference Ferré Moragues9] by the second author. Namely, [Reference Ferré Moragues9, Theorem 1.5] shows that for any system $(X,\mathcal {B},\mu , T_1,\ldots , T_k)$ with $T_i$ and $T_iT_j^{-1}$ ergodic (for all i and $j \neq i$ ) and $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the sequence

(5) $$ \begin{align} \int_X f_0 \cdot T_1^n f_1 \cdots T_k^n f_k \, d\mu\end{align} $$

can be decomposed as a sum of a uniform limit of k-step nilsequences plus a nullsequence.

For more general expressions (as in equation (3)), exploiting results from [Reference Johnson20], it is also shown in [Reference Ferré Moragues9] that if we further assume ergodicity in all directions, that is, $T_1^{a_1}\cdots T_d^{a_d}$ is ergodic for all $(a_1,\ldots ,a_d) \in {\mathbb Z}^d \setminus \{\mathbf {{0}}\}$ , then for any family of pairwise distinct polynomials $p_1,\ldots ,p_k\colon {\mathbb Z} \rightarrow {\mathbb Z}^d$ , the sequence

(6) $$ \begin{align} \int_X f_0 \cdot T_{p_1(n)}f_1 \cdots T_{p_k(n)} f_k \, d\mu \end{align} $$

can be decomposed as a sum of a uniform limit of D-step nilsequences plus a nullsequence. (Here D depends on $k, d$ and the maximum degree of the $p_i$ terms. It also has a connection to the number of van der Corput operations we have to run in the induction (see Remark 5.14 for details).) The proof of this result makes essential use of a seminorm bound estimate obtained in [Reference Johnson20], where the (infinitely many) ergodicity assumptions are reflected (see [Reference Ferré Moragues9, Theorem 1.6]).

In [Reference Donoso, Koutsogiannis and Sun7], the first, third, and fourth authors improved the seminorm bound estimates of [Reference Johnson20] by imposing only finitely many ergodic assumptions. Although the results in [Reference Donoso, Koutsogiannis and Sun7] are stronger than those in [Reference Johnson20], one cannot apply them directly to [Reference Ferré Moragues9] to improve the aforementioned results, due to the incompatibility of the methods between the two studies [Reference Donoso, Koutsogiannis and Sun7, Reference Ferré Moragues9] (see §2.3 for more details).

In this article, we extend results from [Reference Donoso, Koutsogiannis and Sun7] to obtain splitting theorems for multicorrelation sequences involving multiparameter polynomials, postulating ergodicity assumptions which are even weaker than those in [Reference Donoso, Koutsogiannis and Sun7] on the transformations that define the $\mathbb {Z}^d$ -action in equation (6); for example, we will see that the sequence $\int _X f_0 \cdot T_1^{n^2}T_{2}^{n} f_1 \cdot T_3^{n^2}T_{4}^{n} f_2 \, d\mu $ admits the desired splitting if we assume that $T_1, T_3, T_1T_3^{-1}$ are ergodic.

1.2 The joint ergodicity phenomenon

In his ergodic theoretic proof of Szemerédi’s theorem, Furstenberg [Reference Furstenberg15] studied the averages of the multicorrelation sequence in equation (2). In particular, a stepping stone in the proof is the special case when the transformation T is weakly mixing (that is, $T\times T$ is ergodic for $\mu \times \mu $ ), in which he showed that the averages

(7) $$ \begin{align} \frac{1}{N}\sum_{n=1}^N T^nf_1\cdots T^{kn}f_k \end{align} $$

converge in $L^2(\mu )$ to $\prod _{i=1}^k \int _X f_i \, d\mu $ (which we will refer to as the ‘expected limit’) as $N\to \infty $ . (Throughout this paper, unless otherwise stated, all limits of measurable functions on a measure-preserving system are taken in $L^{2}$ .) It was Berend and Bergelson [Reference Berend and Bergelson1] who characterized when the average of the integrand of equation (5), that is, for multiple commuting transformations, converges to the expected limit (and this happens exactly when $T_1 \times \cdots \times T_k$ and $T_iT_j^{-1}$ for all $i\neq j$ are ergodic).

Generalizing Furstenberg’s result, Bergelson showed (in [Reference Bergelson2]) that, for a weakly mixing transformation T and essentially distinct polynomials $p_{1},\ldots ,p_{k}$ (that is, $p_{i}, p_{i}-p_{j}$ are non-constant for all $1\leq i,j\leq k, i\neq j$ ),

$$ \begin{align*} \lim_{N \to \infty} \frac{1}{N}\sum_{n=1}^N T^{p_1(n)}f_1\cdots T^{p_k(n)}f_k=\prod_{i=1}^k \int_X f_i \, d\mu.\end{align*} $$

(For T totally ergodic (that is, $T^n$ is ergodic for all $n\in \mathbb {N}$ ) and $p_1,\ldots ,p_k$ ‘independent’ integer polynomials, it is proved in [Reference Frantzikinakis and Kra14] that we have the same conclusion. This fact remains true for an ergodic T and ‘strongly independent’ real-valued polynomials iterates, $[p_1(n)],\ldots ,[p_k(n)]$ ( $[\cdot ]$ denotes the floor function), as well (see [Reference Karageorgos and Koutsogiannis21]). These last two results also follow by a recent work of Frantzikinakis, [Reference Frantzikinakis11], in which, for single $T,$ we have a plethora of joint ergodicity results for a number of classes of iterates (not just polynomial). Finally, for real variable polynomial iterates, one is referred to [Reference Koutsogiannis26].) One can think of this last result as a strong independence property of the sequences $(T^{p_{i}(n)})_{n\in \mathbb {Z}}, 1\leq i\leq k$ in the weakly mixing case. It is reasonable to expect, under additional assumptions on the system and/or the polynomial iterates, convergence, of the averages appearing in the previous relation, to the expected limit, which naturally leads to a general notion of joint ergodicity (a sequence of finite subsets $(I_{N})_{N\in \mathbb {N}}$ of $\mathbb {Z}^L$ with the property $\lim _{N\to \infty }\vert I_{N}\vert ^{-1}\cdot \vert (g+I_{N})\triangle I_{N}\vert =0$ for all $g\in \mathbb {Z}^L$ is called a Følner sequence in $\mathbb {Z}^L$ ).

Definition 1.2. Let $d, k, L\in \mathbb {N}, p_1,\ldots ,p_k\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be polynomials and $(X,\mathcal {B},\mu , (T_{g})_{g\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ -system. We say that the sequence of tuples $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if for every $f_{1},\ldots ,f_{k} \in L^{\infty }(\mu )$ and every Følner sequence $(I_{N})_{N\in \mathbb {N}}$ of $\mathbb {Z}^{L}$ , we have that

(8) $$ \begin{align} \lim_{N\to\infty}\frac{1}{\vert I_{N}\vert}\sum_{n\in I_{N}}T_{p_{1}(n)}f_{1}\cdots T_{p_{k}(n)}f_{k}=\int_{X}f_{1}\,d\mu\cdots \int_{X}f_{k}\,d\mu. \end{align} $$

When $k=1$ , we also say that $(T_{p_{1}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu .$

The following conjecture was stated in [Reference Donoso, Koutsogiannis and Sun7].

Conjecture 1.3. [Reference Donoso, Koutsogiannis and Sun7, Conjecture 1.5]

Let $d,k,L\in \mathbb {N}, p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be polynomials and $(X,\mathcal {B},\mu , (T_{g})_{g\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ -system. Then the following are equivalent.

  • (C1) $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ .

  • (C2) The following conditions are satisfied:

    1. (i) $(T_{p_{i}(n)-p_{j}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ ; and

    2. (ii) $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for the product measure $\mu ^{\otimes k}$ on $X^{k}$ .

Answering a question of Bergelson, it was shown in [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.4] that for a polynomial $p:\mathbb {Z}^L\to \mathbb {Z},$ the sequence $(T_{1}^{p(n)},\ldots ,T_{k}^{p(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if and only if $((T_{1}\times \cdots \times T_{k})^{p(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ and $T_{i}T^{-1}_{j}$ is ergodic for $\mu $ for all $i\neq j.$ In this paper, the strong decomposition results that we obtain allow us to deduce joint ergodicity results for a larger family of polynomials (see Theorems 2.5 and 2.9), thus addressing some additional cases in the aforementioned conjecture.

2 Main results

In this section, we state the main results of the paper and provide a number of examples to better illustrate them. We also comment on the approaches that we follow.

2.1 Splitting results

Our first main concern is to resolve the incompatibility between [Reference Donoso, Koutsogiannis and Sun7] and [Reference Ferré Moragues9], and improve the method in [Reference Donoso, Koutsogiannis and Sun7], to obtain an extension of the results in [Reference Ferré Moragues9].

Before we state our first result, we need to introduce some notation.

For $d,L\in \mathbb {N}$ , the polynomial $q=(q_1,\ldots ,q_d): \mathbb {Z}^L\to \mathbb {Z}^d$ is non-constant if some $q_i$ is non-constant. Here we mean that each $q_i$ is a member of $\mathbb {Q}[x_1,\ldots ,x_L]$ with ${q_i(\mathbb {Z}^L)\subseteq \mathbb {Z}.}$ The degree of q is defined as the maximum of the degrees of the $q_i$ terms.

The polynomials $p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ are called essentially distinct if they are non-constant and $p_{i}-p_{j}$ is non-constant for all $i\neq j$ . (In general, a polynomial ${q\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}}$ has rational coefficients (that is, vectors with rational coordinates).)

For a subset A of $\mathbb {Q}^{d}$ , we denote The following subgroups of ${\mathbb Z}^{d}$ play an important role in this paper.

Definition 2.1. Let ${\mathbf {p}}=(p_{1},\ldots ,p_{k}), p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials with $p_{i}(n)=\sum _{v\in \mathbb {N}_0^{L}}b_{i,v}n^{v}$ for some $b_{i,v}\in \mathbb {Q}^{d}$ with at most finitely many $b_{i,v},v\in \mathbb {N}_0^{L}$ non-zero. (Here, we denote for ${n=(n_{1},\ldots ,n_{L})\in {\mathbb Z}^{L}}$ and $v=(v_{1},\ldots ,v_{L})\in {\mathbb N}_0^{L}$ , where $0^0:=1.$ ) For convenience, we artificially denote $p_{0}$ as the constant zero polynomial and for all $v\in {\mathbb N}_{0}^{L}$ . For $0\leq i,j\leq k$ , set and , where, for $v=(v_1,\ldots ,v_L)\in {\mathbb N}_{0}^{L},$ we write $\vert v\vert =v_1+\cdots +v_L.$

Our main result provides an affirmative answer to Question 1.1 under finitely many ergodicity assumptions on the groups $G_{i,j}(\mathbf {p})$ , which generalizes [Reference Ferré Moragues9, Theorem 1.5]. We say that the group $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ if any function $f \in L^2(\mu )$ that is $T_{a}$ -invariant for all $a \in G_{i,j}(\mathbf {p})$ is constant.

The definition of a D-step nilsequence will be given in §3.1. We say that $a\colon {\mathbb Z}^L \to {\mathbb C}$ is a nullsequence if for any Følner sequence $(I_N)_{N\in {\mathbb N}}, \lim _{N \to \infty } {1}/{|I_N|} \sum _{n \in I_N} |a(n)|^2=0.$

Theorem 2.2. (Decomposition theorem under finitely many ergodicity assumptions)

For $d,k,K,L\in \mathbb {N},$ let $\mathbf {p}=(p_{1},\ldots ,p_{k}),$ where $p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ is a family of essentially distinct polynomials of degree at most $K,$ and let $(X,\mathcal {B},\mu , (T_{n})_{n\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ -system. If $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq i,j\leq k,i\neq j,$ then for all $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the multicorrelation sequence

can be decomposed as a sum of a uniform limit of D-step nilsequences and a nullsequence, where $D\in {\mathbb N}$ is a constant depending only on $d,k,K,L.$

We refer the reader to Remark 5.14 for a further discussion on the constant D. Also, note that Theorem 2.2 goes beyond Question 1.1 as it deals with multivariable polynomial iterates (that is, $L>1$ ).

Example 2.3. It was proved in [Reference Ferré Moragues9, Theorem 1.5] that for any probability space $(X,\mathcal {B},\mu )$ and commuting transformations $T_{1},\ldots ,T_{k}$ acting on X, if $T_i$ and $T_iT_j^{-1}$ are ergodic (for all i and all $j \neq i$ , respectively), then for all $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the multicorrelation sequence

can be decomposed as a sum of a uniform limit of k-step nilsequences plus a nullsequence. While Theorem 2.2 does not specify the step D of the nilsequence, a quick argument shows that, in this case, one can indeed take $D=k$ (see Remark 6.1 for details).

The following example shows that Theorem 2.2 is stronger than [Reference Ferré Moragues9, Theorem 1.6], which deals with single variable essentially distinct polynomial iterates.

Example 2.4. Let $(X,\mathcal {B},\mu ,T_{1},\ldots ,T_{6})$ be a system with commuting transformations $T_{1},\ldots ,T_{6}$ and $f_{0},f_{1},\ldots ,f_{4}\in L^{\infty }(\mu )$ . Using [Reference Ferré Moragues9, Theorem 1.6], we have that the multicorrelation sequence

(9) $$ \begin{align} \alpha(n)=\int_X f_0\cdot T_{1}^{n^{2}}T_{2}^{n}f_1\cdot T_{1}^{n^{2}}T_{3}^{n}f_2\cdot T_{4}^{n^{3}}f_3\cdot T_{5}^{n^{3}}T_{6}^{n}f_4\,d\mu \end{align} $$

can be decomposed as the sum of a uniform limit of nilsequences and a nullsequence if $T^{a_{1}}_{1}\cdots T^{a_{6}}_{6}$ is ergodic for all $(a_{1},\ldots ,a_{6})\in {\mathbb Z}^{6}\backslash \{\mathbf {{0}}\}$ . In contrast, via Theorem 2.2, one can get the same conclusion by only assuming that $T_{1},T_{2}T_{3}^{-1}, T_{4},T_{5},T_{4}T^{-1}_{5}$ are ergodic. (Indeed, denoting , and $e_{i}$ the vector whose ith entry is 1 and all other entries are 0, since $\mathbf {{p}}=((n^2,n,0,0,0,0),(n^2,0,n,0,0,0)$ , $(0,0,0,n^3,0,0)$ , $(0,0,0,0,n^3,n)),$ we have that $G_{1,0}(\mathbf {p})=G_{2,0}(\mathbf {p})=G(e_{1})$ , $G_{1,3}(\mathbf {p})= G_{2,3}(\mathbf {p})=G_{3,0}(\mathbf {p})\ =\ G(e_{4})$ , $G_{1,4}(\mathbf {p})\ =\ G_{2,4}(\mathbf {p})\ =\ G_{4,0}(\mathbf {p})\ =\ G(e_{5})$ , ${G_{1,2}(\mathbf {p})\ =\ G(e_{2}-e_{3})}$ , $G_{3,4}(\mathbf {p})=G(e_{4}-e_{5})$ .)

2.2 Convergence to the expected limit

In [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.4], the first, third, and fourth authors proved the following case of Conjecture 1.3. If $T_{1},\ldots ,T_{k}$ are commuting transformations acting on a probability space $(X,\mathcal {B},\mu )$ , then $(T_{1}^{p(n)},\ldots ,T_{k}^{p(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if and only if $((T_{1}\times \cdots \times T_{k})^{p(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ and $T_{i}T^{-1}_{j}$ is ergodic for $\mu $ for all $i\neq j$ . In this paper, we further extend this result.

Theorem 2.5. Let $k,d,L\in {\mathbb N}$ and $\mathbf {p}=(p_{1}v_{1},\ldots , p_{k}v_{k}),$ where $p_{1},\ldots , p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}$ , $v_{1},\ldots , v_{k}\in {\mathbb Z}^{d}$ be a family of essentially distinct polynomials. Suppose that for all $1\leq i,j\leq k$ , if $\deg (p_{i})=\deg (p_{j})$ , then either $v_{i}$ and $v_{j}$ are linearly dependent over $\mathbb {Z}$ , or $p_{i}$ and $p_{j}$ are linearly dependent over $\mathbb {Z}$ (that is, there is a non-trivial linear combination of them over $\mathbb {Z}$ which equals to a constant). Let $(X,\mathcal {B},\mu , (T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ -system. Then the following are equivalent.

  • (C1) $(T_{p_{1}(n)v_{1}},\ldots ,T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is jointly ergodic for $\mu .$

  • (C2’) The following subconditions hold:

    • (i)’ $(T_{p_{i}(n)v_{i}-p_{j}(n)v_{j}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ with $\deg (p_{i})=\deg (p_{j})$ ;

    • (ii) $(T_{p_{1}(n)v_{1}}\times \cdots \times T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .

Moreover, condition (C2’) is equivalent to

  • (C2) The following subconditions hold:

    1. (i) $(T_{p_{i}(n)v_{i}-p_{j}(n)v_{j}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ ;

    2. (ii) $(T_{p_{1}(n)v_{1}}\times \cdots \times T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .

Note that the subconditions in condition (C2) are consistent with those in Conjecture 1.3. However, the reason we provide an alternative set of equivalent subconditions in condition (C2’) is that these subconditions are easier to check in practice.

We now give some examples to illustrate Theorem 2.5. The first one is for polynomials of distinct degrees.

Example 2.6. Let $(X,\mathcal {B},\mu ,T_{1},\ldots ,T_{k})$ be a system. Using Theorem 2.5, we conclude that $(T^{n}_{1},T^{n^{2}}_{2},\ldots ,T^{n^{k}}_{k})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n}_{1}\times \cdots \times T^{n^{k}}_{k})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes k}$ , and all the $T_{i}$ terms are ergodic for $\mu $ .

We remark that Example 2.6 can also be proved by using arguments from [Reference Chu, Frantzikinakis and Host6]. We next present two examples in which some polynomials have the same degree and so cannot be recovered by the methods of [Reference Chu, Frantzikinakis and Host6].

Example 2.7. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3,T_{4})$ be a system. Theorem 2.5 implies that $(T^{n}_{1},T^{n}_{2},T^{n^{2}}_{3}, T^{n^{2}}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n}_{1}\times T^{n}_{2}\times T^{n^{2}}_{3}\times T^{n^{2}}_{4})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes 4}$ , and both $T_{1}T^{-1}_{2}$ and $((T_{3}T^{-1}_{4})^{n^2})_{n\in {\mathbb N}}$ are ergodic for $\mu $ .

Example 2.8. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3)$ be a system. Theorem 2.5 implies that $(T^{n^{4}+n^{2}}_{1}$ , $T^{2n^{4}+3n}_{1}, T^{2n^{2}+2n+1}_{2}, T^{3n^{2}+3n}_{3})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n^{4}+n^{2}}_{1}\times T^{2n^{4}+3n}_{1}\times T^{2n^{2}+2n+1}_{2}\times T^{3n^{2}+3n}_{3})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes 4}$ , and both sequences $(T^{-n^{4}+n^{2}-3n}_{1})_{n\in {\mathbb Z}}$ and $((T^{2}_{2}T_{3}^{-3})^{n^{2}+n})_{n\in {\mathbb Z}}$ are ergodic for $\mu $ .

Another direction for the joint ergodicity problem is verifying whether condition (C1) implies condition (C2) in Conjecture 1.3. Namely, assume that $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ to find a condition, say (P), of certain sequences of actions to be ergodic, under which we have that $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ . By combining existing results from [Reference Host and Kra18, Reference Johnson20] (see also [Reference Donoso, Koutsogiannis and Sun7, Proposition 1.2]), (P) can be taken to be ‘ $T_{g}$ is ergodic for $\mu $ for all $g\in {\mathbb Z}^{d}\backslash \{\mathbf {{0}}\}$ ’. Denoting $p_{i}(n)=\sum _{v\in {\mathbb N}_0^{L},0\leq \vert v\vert \leq K}b_{i,v}n^{v}$ for some $b_{i,v}\in \mathbb {Q}^{d}$ and $K\in {\mathbb N}_0$ , this result was extended in [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.3], where the previous property is replaced by ‘ $T_{g}$ is ergodic for $\mu $ for all g that belongs to the finite set R’, where

$$ \begin{align*} R=\bigcup_{0< \vert v\vert\leq K}\{b_{i,v}, b_{i,v}-b_{j,v}\colon 1\leq i, j\leq k\}\backslash\{\mathbf{0}\}.\end{align*} $$

In this paper, we replace the latter condition with an even weaker one.

Theorem 2.9. Let $d,k,L\in \mathbb {N}, \mathbf {p}=(p_{1},\ldots , p_{k}), p_{1},\ldots , p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials and $(X,\ \mathcal {B},\ \mu ,\ (T_{g})_{g\in \mathbb {Z}^{d}})$ a $\mathbb {Z}^{d}$ -system. Then, $(T_{p_{1}(n)},\ldots , T_{p_{k}(n)})_{n\in {\mathbb Z}^{L}}$ is jointly ergodic for $\mu $ if both of the following conditions hold:

  1. (i) $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq i,j\leq k,i\neq j$ ;

  2. (ii) $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .

The last example for this section reflects the stronger nature of the previous theorem compared to what was previously known.

Example 2.10. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3,T_{4})$ be a system. Then, [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.3] implies that $(T^{n^2}_{1} T^{n}_{2},T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if $((T^{n^2}_{1} T^n_2)\times (T^{n^2}_3 T^n_{4}))_{n\in \mathbb {Z}}$ is ergodic for $\mu ^{\otimes 2}$ , and all $T_{1},T_{2},T_{3},T_{4},T_{1}T_{3}^{-1},T_{2}T_{4}^{-1}$ are ergodic for $\mu $ . Using Theorem 2.9, we conclude that $(T^{n^2}_{1} T^{n}_{2}, T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if we instead only assume that $((T^{n^2}_{1} T^n_2)\times (T^{n^2}_3 T^n_{4}))_{n\in \mathbb {Z}}$ is ergodic for $\mu ^{\otimes 2}$ , and all $T_{1},T_{3},T_{1}T_{3}^{-1}$ are ergodic for $\mu $ .

Unfortunately, Theorem 2.9 does not imply Conjecture 1.3 for the pair $(T^{n^2}_{1} T^{n}_{2}, T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ . This is because $T_{1},T_{3},T_{1}T_{3}^{-1}$ being ergodic for $\mu $ is independent of $((T_{1}T_{3}^{-1})^{n^2} (T_{2}T_{4}^{-1})^{n})_{n\in {\mathbb Z}}$ being ergodic for $\mu $ . For example, if $T_{1}=T_{3}=T_{4}=\mathrm {id}$ and $T_{2}$ is any ergodic transformation, then $((T_{1}T_{3}^{-1})^{n^2} (T_{2}T_{4}^{-1})^{n})_{n\in {\mathbb Z}}$ is ergodic for $\mu $ but $T_{1},T_{3},T_{1}T_{3}^{-1}$ are not. However, if $X=\{0,\ldots ,6\}$ with $\mu (\{i\})=1/7$ , $T_{1}x:=x+1\ \mod 7$ , $T_{3}=T_{1}^{2}$ , and $T_{2}=T_{4}=\mathrm {id}$ , then $T_{1},T_{3},T_{1}T_{3}^{-1}$ are ergodic for $\mu $ but $((T_{1}T_{3}^{-1})^{n^2} (T_{2}T_{4}^{-1})^{n})_{n\in {\mathbb Z}}$ is not.

2.3 Strategy of the paper

The central ingredient in proving the main results of the paper (Theorems 2.2, 2.5, and 2.9) is to find proper characteristic factors for the limit of the average in equation (4), that is, sub- $\sigma $ -algebras $\mathcal {D}_{1},\ldots ,\mathcal {D}_{k}$ of $\mathcal {B}$ such that the average in equation (4) remains invariant if we replace each $f_{i}$ by its conditional expectation (see below for the definition) with respect to $\mathcal {D}_{i}$ . An important type of characteristic factor, called the Host–Kra characteristic factor, was invented in [Reference Host and Kra18] to study multiple averages for ${\mathbb Z}$ -systems (see below for the definition of these factors). This concept was generalized to systems with commuting transformations in [Reference Host17] (see also [Reference Sun31]).

To introduce the main tool used in our results (Theorem 2.11), special cases of which have been studied extensively in the past (see for example [Reference Chu, Frantzikinakis and Host6, Reference Frantzikinakis and Kra14, Reference Host17, Reference Host and Kra18, Reference Johnson20]), we need to introduce the machinery of Host–Kra seminorms and factors.

Host–Kra seminorms and their associated factors are arguably the main tools used to analyze the behavior of multiple averages and correlation sequences. In what follows, we give general results about these seminorms and factors, following the notation used in [Reference Donoso, Koutsogiannis and Sun7].

We first recall the notions of a factor and of the conditional expectation with respect to a factor. We say that the ${\mathbb Z}^d$ -system $(Y,\mathcal {D},\nu ,(S_{g})_{g\in {\mathbb Z}^d})$ is a factor of $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ if there exists a measurable map $\pi \colon X\to Y$ such that $\mu (\pi ^{-1}(A))=\nu (A)$ for all $A\in \mathcal {D}$ , and $\pi \circ T_{g}=S_{g}\circ \pi $ for all $g\in {\mathbb Z}^{d}$ .

A factor $(Y,\mathcal {D},\nu ,(S_{g})_{g\in {\mathbb Z}^d})$ of $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ can be identified with an invariant sub- $\sigma $ -algebra $\mathcal {B}'$ of $\mathcal {B}$ by setting . Given two $\sigma $ -algebras, $\mathcal {B}_1$ and $\mathcal {B}_2$ , their joining $\mathcal {B}_1\vee \mathcal {B}_2$ is the $\sigma $ -algebra generated by $B_1\cap B_2$ for all $B_1\in \mathcal {B}_1$ and $B_2\in \mathcal {B}_2$ , that is, the smallest $\sigma $ -algebra containing both $\mathcal {B}_1$ and $\mathcal {B}_2$ .

Given a factor $\pi \colon (X,\mathcal {B},\mu ) \to (Y,\mathcal {D},\nu )$ and a function $f\in L^{2}(\mu )$ , the conditional expectation of f with respect to Y is the function $g\in L^{2}(\nu ),$ which we denote by $\mathbb {E}(f \mid Y),$ with the property

$$ \begin{align*} \int_A g\circ \pi \, d\mu=\int_A f \ d\mu\quad \textrm{for all } A \in \pi^{-1}(\mathcal{D}).\end{align*} $$

Let $(X,\mathcal {B},\mu )$ be a probability space and let $\mathcal {B}_1$ be a sub- $\sigma $ -algebra of $\mathcal {B}$ . The relatively independent joining of $(X,\mathcal {B},\mu )$ with itself with respect to $\mathcal {B}_1$ is the probability space $(X\times X, \mathcal {B} \otimes \mathcal {B}, \mu \times _{\mathcal {B}_1} \mu )$ , where the measure $\mu \times _{\mathcal {B}_1} \mu $ is given by the formula:

$$ \begin{align*} \int_{X \times X} f_1 \otimes f_2 \, d(\mu \times_{\mathcal{B}_1} \mu)=\int_X \mathbb{E}(f_1|\mathcal{B}_1) \mathbb{E}(f_2| \mathcal{B}_1) \, d\mu, \end{align*} $$

for all $f_1, f_{2} \in L^{\infty }(\mu )$ .

For a G-system $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{g})_{g\in G})$ , if H is a subgroup of $G,$ we denote by $\mathcal {I}(H)(\mathbf {X})$ the set of $A\in \mathcal {B}$ such that $T_{g}A=A$ for all $g\in H$ . When there is no confusion, we write $\mathcal {I}(H)$ .

For a ${\mathbb Z}^d$ -system $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ and $H_1,\ldots ,H_k$ subgroups of ${\mathbb Z}^d,$ define

$$ \begin{align*}\mu_{H_{1}}=\mu\times_{\mathcal{I}(H_{1})}\mu\end{align*} $$

and for $k>1$ , let

$$ \begin{align*}\mu_{H_{1},\ldots,H_{k}}=\mu_{H_{1},\ldots,H_{k-1}}\times_{\mathcal{I}(H_{k}^{[k-1]})}\mu_{H_{1},\ldots,H_{k-1}},\end{align*} $$

where $H^{[k-1]}_{k}$ denotes the subgroup of $({\mathbb Z}^{d})^{2^{k-1}}$ consisting of all the elements of the form $(h_{k},\ldots , h_{k})$ ( $2^{k-1}$ copies of $h_{k}$ ) for some $h_{k}\in H_{k}$ . The characteristic factor $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})$ is defined to be the sub- $\sigma $ -algebra of $\mathcal {B}$ characterized by

for all $f \in L^{\infty }(\mu )$ , where $X^{[k]}=X\times \cdots \times X$ ( $2^k$ copies of X), $\vert \epsilon \vert =\epsilon _{1}+\cdots +\epsilon _{k}$ for $\epsilon =(\epsilon _{1},\ldots ,\epsilon _{k})\in \{0,1\}^{k}$ , and $\mathcal {C}^{2r+1}f=\overline {f},$ the complex conjugate of f, $\mathcal {C}^{2r}f=f$ for all $r\in {\mathbb Z}$ . The quantity $\lvert \hspace{-1pt}\lvert \hspace{-1pt}\lvert f\rvert \hspace{-1pt}\rvert \hspace{-1pt}\rvert _{H_1,\ldots ,H_k}$ denotes the Host–Kra seminorm of f with respect to the subgroups $H_1,\ldots ,H_k$ . Similar to the proof of [Reference Host17, Lemma 4] or [Reference Host and Kra18, Lemma 4.3], one can show that $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})$ is well defined.

Theorem 2.11. Let $d,k,K,L\in \mathbb {N}, \mathbf {p}=(p_{1},\ldots , p_{k}), p_{1},\ldots , p_{k}\in \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials of degrees at most $K.$ There exists $D\in \mathbb {N}_0$ depending only on $d,k,K,L$ such that for every $\mathbb {Z}^{d}$ -system $\mathbf {X}=(X,\mathcal {B},\mu , (T_{n})_{n\in \mathbb {Z}^{d}})$ , every $f_{1},\ldots , f_{k}\in L^{\infty }(\mu )$ , and every Følner sequence $(I_{N})_{N\in {\mathbb N}}$ of ${\mathbb Z}^{L}$ , if $f_{i}$ is orthogonal to the Host–Kra characteristic factor ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ for some $1\leq i\leq k$ (that is, the conditional expectation of $f_{i}$ under ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ is $0$ ), then we have that

(10) $$ \begin{align} \lim_{N\to\infty}\frac{1}{|I_N|}\sum_{n\in I_{N}}\prod_{i=1}^k T_{p_{i}(n)}f_i=0. \end{align} $$

In particular, if for some $1\leq i\leq k$ , $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq j\leq k, j\neq i$ and $f_{i}$ is orthogonal to the Host–Kra characteristic factor ${Z}_{({\mathbb Z}^{d})^{\times kD}}(\mathbf {X})$ , then equation (10) holds.

It is worth noting that the factor ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ we obtain in Theorem 2.11 is not optimal, but it is good enough for our purposes.

A special case of Theorem 2.11 was proved in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1]. In particular, Theorem 2.11 generalizes [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] in the following ways.

  1. (I) The characteristic factor obtained in Theorem 2.11 is of finite step, whereas that in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] is of infinite step.

  2. (II) The groups $G_{i,j}(\mathbf {p})$ involved in Theorem 2.11 are larger than those in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1], which makes the characteristic factors in Theorem 2.11 smaller.

We remark that the aforementioned technical distinctions have significant influences on the applications of Theorem 2.11. First, the essential reason why one cannot directly use [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] to improve [Reference Ferré Moragues9, Theorem 1.5] is that the method used in [Reference Ferré Moragues9] requires a characteristic factor of finite step. This problem is resolved by generalization (I), enabling us to extend [Reference Ferré Moragues9, Theorem 1.5] in this paper. Second, [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] does not provide a strong enough characteristic factor in certain circumstances. For example, in the case of Example 2.6, [Reference Chu, Frantzikinakis and Host6, Theorem 6.5] suggests that the Host–Kra seminorms controlling equation (10) depend only on the transformations $T_{1},\ldots ,T_{k}$ , whereas the upper bound provided by [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] depends not only on the transformations $T_{1},\ldots ,T_{k}$ but also on many compositions of them. With the help of generalizations (I) and (II), we are able to obtain (and generalize) the aforementioned upper bound of [Reference Chu, Frantzikinakis and Host6, Theorem 6.5].

Roughly speaking, the achievement of generalization (I) relies on a sophisticated development of a Bessel-type inequality first obtained by Tao and Ziegler in [Reference Tao and Ziegler33, Proposition 3.6]. The most technical part of this paper is the approach we use to get generalization (II). In [Reference Donoso, Koutsogiannis and Sun7], a method was introduced to keep track of the coefficients of the polynomials while running a variation of the polynomial exhaustion technique (PET) induction. However, the tracking provided there is not strong enough to imply Theorem 2.11. To overcome this difficulty, we introduce more sophisticated machinery to have a better control of the coefficients.

The paper is organized as follows. We provide some background material in §3. In §4, we present the variation of PET induction that we use. In §5, we address how generalizations (I) and (II) above can be achieved with Propositions 5.2 and 5.4, which improve Propositions 5.6 and 5.5 of [Reference Donoso, Koutsogiannis and Sun7], respectively. We conclude the section by proving Theorem 2.11. This is the bulk of the paper. In §6, we use Theorem 2.11 to deduce Theorems 2.2, 2.5, and 2.9, which are the main results of the paper. We conclude with some discussions on future directions in §7.

2.4 Notation

We denote by $\mathbb {N}, \mathbb {N}_0, \mathbb {Z}, \mathbb {Q}, \mathbb {R}$ , and $\mathbb {C}$ the sets of positive integers, non-negative integers, integers, rational numbers, real numbers, and complex numbers, respectively. If X is a set and $d\in \mathbb {N}$ , $X^d$ denotes the Cartesian product $X\times \cdots \times X$ of d copies of X.

We will denote by $e_i$ the vector that has $1$ as its ith coordinate and $0$ elsewhere. We use in general lower-case letters to symbolize both numbers and vectors but bold letters to symbolize vectors of vectors to highlight this exact fact. The only exception to this convention is the vector $\mathbf {0}$ (that is, the vector with coordinates only $0$ ) which we always symbolize in bold.

Throughout this article, we use the following notation for averages. Let $(a(n))_{n\in {\mathbb Z}^L}$ be a sequence of complex numbers, or a sequence of measurable functions on a probability space $(X,\mathcal {B},\mu )$ . We let:

(we use the symbol $\square $ to highlight the fact that the averages are taken along the boxes $[-N,N]^{L}$ );


It is worth noticing that if the limit $\lim _{N\to \infty } \mathbb {E}_{n\in I_N} a(n)$ exists for all Følner sequences (in ${\mathbb Z}^L$ ), then this limit does not depend on the chosen Følner sequence.

We also consider iterated averages. Let $(a(h_{1},\ldots ,h_{s}))_{h_{1},\ldots ,h_{s}\in {\mathbb Z}^L}$ be a multiparameter sequence. We let

and adopt similar conventions for $\mathbb {E}_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ , ${\overline {\mathbb {E}}}^{\square }_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ , and $\mathbb {E}^{\square }_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ .

We end this section by recalling the notion of a system indexed by a countable abelian group $(G,+)$ . We say that a tuple $(X,\mathcal {B},\mu ,(T_{g})_{g\in G})$ is a G-measure-preserving system (or a G-system) if $(X,\mathcal {B},\mu )$ is a probability space and $T_{g}\colon X\to X$ are measurable, measure-preserving transformations on X such that $T_{e_{G}}=\mathrm {id}$ ( $e_G$ is the identity element of G) and $T_{g}\circ T_{h}=T_{g+h}$ for all $g,h\in G$ . A G-system will be called ergodic if for any $A\in \mathcal {B}$ such that $T_{g}A=A$ for all $g\in G$ , we have that $\mu (A)\in \{0,1\}$ . In this paper, we are mostly concerned about ${\mathbb Z}^{d}$ -systems and $L^2(\mu )$ -norm limits of (multiple) ergodic averages. For the corresponding norm, when it is clear from the context, we will write $\Vert {\cdot} \Vert _2$ instead of $\Vert {\cdot} \Vert _{L^2(\mu )}$ .

3 Background material

In this section, we recall some background material and prove some intermediate results that will be used later throughout the paper.

We summarize some basic properties of the Host–Kra seminorms and their associated factors.

Proposition 3.1. [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.4]

Let $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ be a ${\mathbb Z}^{d}$ -system, $H_{1},\ldots ,H_{k},H'$ be subgroups of ${\mathbb Z}^{d}$ and $f\in L^{\infty }(\mu )$ .

  1. (i) For every permutation $\sigma \colon \{1,\ldots ,k\}\to \{1,\ldots ,k\}$ , we have that

    $$ \begin{align*} Z_{H_{1},\ldots,H_{k}}(\mathbf{X})=Z_{H_{\sigma(1)},\ldots,H_{\sigma(k)}}(\mathbf{X}), \end{align*} $$

    and hence the corresponding seminorm does not depend on the particular order taken for the subgroups $H_1,\ldots ,H_k.$

  2. (ii) If $\mathcal {I}(H_{j})=\mathcal {I}(H')$ , then $Z_{H_{1},\ldots ,H_{j},\ldots ,H_{k}}(\mathbf {X})=Z_{H_{1},\ldots ,H_{j-1},H',H_{j+1},\ldots ,H_{k}}(\mathbf {X})$ .

  3. (iii) For $k\geq 2$ , we have that

    $$ \begin{align*}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2^{k}}_{H_{1},\ldots,H_{k}} =\mathbb{E}_{g\in H_{k}}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\cdot T_{g}\overline{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2^{k-1}}_{H_{1},\ldots,H_{k-1}},\end{align*} $$

    while for $k=1,$

    $$ \begin{align*}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2}_{H_{1}} =\mathbb{E}_{g\in H_{1}}\int_{X} f\cdot T_{g}\overline{f}\,d\mu.\end{align*} $$
  4. (iv) Let $k\geq 2$ . If $H'\leq H_{j}$ is of finite index, then

    $$ \begin{align*}Z_{H_{1},\ldots,H_{j},\ldots,H_{k}}(\mathbf{X})=Z_{H_{1},\ldots,H_{j-1},H',H_{j+1},\ldots,H_{k}}(\mathbf{X}).\end{align*} $$
  5. (v) If $H'\leq H_{j}$ , then $Z_{H_{1},\ldots ,H_{j},\ldots ,H_{k}}(\mathbf {X})\subseteq Z_{H_{1},\ldots ,H_{j-1},H',H_{j+1},\ldots ,H_{k}}(\mathbf {X})$ .

  6. (vi) For $k\geq 2$ , $\lvert \hspace{-1pt}\lvert \hspace{-1pt}\lvert f\rvert \hspace{-1pt}\rvert \hspace{-1pt}\rvert _{H_1,\ldots ,H_{k-1}}\leq \lvert \hspace{-1pt}\lvert \hspace{-1pt}\lvert f\rvert \hspace{-1pt}\rvert \hspace{-1pt}\rvert _{H_1,\ldots ,H_{k-1},H_k}$ and thus

    $$ \begin{align*}Z_{H_1,\ldots,H_{k-1}}(\mathbf{X})\subseteq Z_{H_1,\ldots,H_{k-1},H_k}(\mathbf{X}).\end{align*} $$
  7. (vii) For $k\geq 1$ , if $H_1',\ldots , H_k'$ are subgroups of $ {\mathbb Z}^d$ , then

    $$ \begin{align*}Z_{H_1,\ldots,H_k}(\mathbf{X}) \vee Z_{H_1',\ldots,H_k'}(\mathbf{X}) \subseteq Z_{H_1',\ldots,H_k',H_1,\ldots,H_k}(\mathbf{X}).\end{align*} $$

As an immediate corollary of Proposition 3.1(iv), we have the following corollary.

Corollary 3.2. [Reference Donoso, Koutsogiannis and Sun7, Corollary 2.5]

Let $H_{1},\ldots ,H_{k}$ be subgroups of ${\mathbb Z}^{d}$ . If the $H_{i}$ -action $(T_{g})_{g\in H_{i}}$ is ergodic on $\mathbf {X}$ for all $1\leq i\leq k$ , then $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})=Z_{{\mathbb Z}^{d},\ldots ,{\mathbb Z}^{d}}(\mathbf {X})$ .

Convention 3.3. Thanks to Proposition 3.1, we may adopt a flexible and convenient notation while writing the Host–Kra characteristic factors. For example, if $A=\{H_{1},H_{2}\}^{\times 3}$ , then the notation $Z_{A,H_{3},H^{\times 2}_{4},(H_{i})_{i=5,6}}(\mathbf {X})$ refers to $Z_{H_{1},H_{1},H_{1},H_{2},H_{2},H_{2},H_{3},H_{4},H_{4},H_{5},H_{6}}(\mathbf {X})$ (note that thanks to Proposition 3.1(i), $Z_{A,H_{3},H^{\times 2}_{4},(H_{i})_{i=5,6}}(\mathbf {X})$ is well defined regardless of the ordering of A).

Recall that for a subgroup $H\subseteq {\mathbb Z}^d$ , $H^{[1]}$ denotes the subgroup $\{ (h,h)\colon h\in H\}\subseteq {\mathbb Z}^d\times ~{\mathbb Z}^d$ .

Lemma 3.4. Let $d \in {\mathbb N}$ . Let $(X,\mathcal {B},\mu ,(T_n)_{n\in {\mathbb Z}^d})$ be a ${\mathbb Z}^{d}$ -system and $H_1,\ldots ,H_k,H$ be subgroups of ${\mathbb Z}^d$ . Let $f \in L^{\infty }(\mu )$ . Then,

$$ \begin{align*} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f \otimes \bar{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1^{[1]},\ldots,H_{k}^{[1]}} \leq \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1,\ldots,H_{k},H} ^2,\end{align*} $$

where in the left-hand side, we consider the product space $(X\times X,\mathcal {B}\otimes \mathcal {B},\mu \times \mu , (T_{m}\times T_{n})_{(m,n)\in \mathbb {Z}^{2d}})$ .

Proof. We proceed by induction on k. For $k=1,$ using the Cauchy–Schwarz inequality, we have

$$ \begin{align*} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\otimes \overline{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1^{[1]}}^2&= \mathbb{E}_{g\in H_1} \int f\otimes \overline{f} \cdot (T_g\times T_g )\overline{f}\otimes f \, d(\mu\times \mu) \\ &= \mathbb{E}_{g\in H_1} \bigg| \int T_g f \cdot \overline{f} d\mu \bigg|^2 = \mathbb{E}_{g\in H_1} \bigg| \int \mathbb{E}(T_g f \cdot \overline{f} | \mathcal{I}(H))\, d\mu \bigg|^2 \\ & \leq \mathbb{E}_{g\in H_1} \int |\mathbb{E}(T_g f \cdot \overline{f} | \mathcal{I}(H))|^2\, d\mu\\ &= \mathbb{E}_{g\in H_1} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert T_gf\cdot \overline{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H}^2\! =\! \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H,H_1}^4=\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1,H}^4, \end{align*} $$

where we used in the last two equalities Proposition 3.1(iii) and (i), respectively, from where we conclude the required relation by taking square roots.

Suppose that the result holds for $k-1$ . By Proposition 3.1(i) and the induction hypothesis,

$$ \begin{align*} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f \otimes \overline{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1^{[1]},\ldots,H_{k}^{[1]}}^{2^k} & =\mathbb{E}_{g\in H_k}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert (T_g\times T_{g}) f \otimes \overline{f} \cdot \overline{f} \otimes {f} \rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2^{k-1}}_{H_1^{[1]},\ldots,H_{k-1}^{[1]}} \\ & = \mathbb{E}_{g\in H_k}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert T_g f \cdot \overline{f} \otimes T_g \overline{f} \cdot f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2^{k-1}}_{H_1^{[1]},\ldots,H_{k-1}^{[1]}} \\ & \leq \mathbb{E}_{g\in H_k}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert T_g f \cdot \overline{f}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{2^{k}}_{H_1,\ldots,H_{k-1},H} \\ & = \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1,\ldots,H_{k-1},H,H_{k} } = \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_1,\ldots,H_{k-1},H_k,H} \end{align*} $$

and the claim follows.

3.1 Nilsystems, nilsequences, and structure theorem

Let $X=N/\Gamma $ , where N is a (k-step) nilpotent Lie group and $\Gamma $ is a discrete cocompact subgroup of N. Let $\mathcal {B}$ be the Borel $\sigma $ -algebra of $X, \mu $ the normalized Haar measure on $X,$ and for $n\in {\mathbb Z}^{d},$ let ${T_{n}\colon X\to X}$ with $T_{n}x=b_{n}\cdot x$ for some group homomorphism $n\mapsto b_{n}$ from ${\mathbb Z}^{d}$ to N. We say that $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^{d}})$ is a (k-step) ${\mathbb Z}^{d}$ -nilsystem. For $k\geq 1$ , we say that $(a_{n})_{n\in {\mathbb Z}^{d}} \subseteq \mathbb {C}$ is a (k-step) ${\mathbb Z}^{d}$ -nilsequence if there exist a (k-step) ${\mathbb Z}^{d}$ -nilsystem $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^{d}})$ , a function $F \in C(X)$ and $x\in X$ such that $a_{n}=F(T_{n}x)$ for all $n\in {\mathbb Z}^{d}$ . For $k=0$ , a $0$ -step nilsequence is a constant sequence. An important reason which makes the Host–Kra characteristic factors powerful is their connection with nilsystems. The following is a slight generalization of [Reference Ziegler36, Theorem 3.7] (see [Reference Griesmer16, Lemma 4.4.3 and Theorem 4.10.1], or Proposition 3.1(ii) and [Reference Sun31, Theorem 3.7]), which is a higher dimensional version of the Host–Kra structure theorem [Reference Host and Kra18].

Theorem 3.5. Let $\mathbf {X}$ be an ergodic $\mathbb {Z}^{d}$ -system. Then $Z_{(\mathbb {Z}^{d})^{\times k}}(\mathbf {X})$ is an inverse limit of $(k-1)$ -step $\mathbb {Z}^{d}$ -nilsystems.

3.2 Bessel’s inequality

An essential difference in the study of multiple ergodic averages between ${\mathbb Z}$ -systems and ${\mathbb Z}^{d}$ -systems is that in the former case, one can usually bound the average by some Host–Kra seminorm of a function f appearing in the average, whereas in the latter, one can only bound the averages by an average of a family of Host–Kra seminorms of f. To overcome this difficulty, inspired by the work of Tao and Ziegler [Reference Tao and Ziegler33], in this subsection, we derive an upper bound for expressions of the form ${\overline {\mathbb {E}}}_{i\in I}\lvert \hspace{-1pt}\lvert \hspace{-1pt}\lvert f\rvert \hspace{-1pt}\rvert \hspace{-1pt}\rvert _{H_{i,1},\ldots ,H_{i,s}}$ , where I is a finite set and $H_{i,j}$ are subgroups of ${\mathbb Z}^{d}$ .

The proof of the following statement is similar to [Reference Tao and Ziegler33, Corollary 1.22].

Proposition 3.6. (Bessel’s inequality)

Let $t\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ -system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t$ be subgroups of ${\mathbb Z}^{d}$ . Then for all $f\in L^{\infty }(\mu )$ ,

$$ \begin{align*}\mathbb{E}_{i\in I}\| \mathbb{E}(f\vert Z_{H_{i,1},\ldots,H_{i,t}})\|^{2}_{2} \leq \Vert f\Vert_{2}\cdot(\mathbb{E}_{i,j\in I}\Vert \mathbb{E}(f\vert Z_{\{H_{i,i'}+H_{j,j'}\}_{1\leq i',j'\leq t}})\Vert^{2}_{2})^{1/2}.\end{align*} $$

Proof. For convenience, let . Then,

$$ \begin{align*}\mathbb{E}_{i\in I}\Vert \mathbb{E}(f\vert Z_{H_{i,1},\ldots,H_{i,t}})\Vert^{2}_{2}=\langle f,\mathbb{E}_{i\in I}f_{i}\rangle\end{align*} $$

which, by the Cauchy–Schwarz inequality, is bounded by

$$ \begin{align*}\Vert f\Vert_{2}\cdot\vert\mathbb{E}_{i,j\in I}\langle f_{i},f_{j}\rangle\vert^{1/2}. \end{align*} $$

By [Reference Tao and Ziegler33, Corollary 1.21], $L^{\infty }(Z_{H_{i,1},\ldots ,H_{i,t}})$ and $L^{\infty }(Z_{H_{j,1},\ldots ,H_{j,t}})$ are orthogonal on the orthogonal complement of $L^{\infty }(Z_{\{H_{i,i'}+H_{j,j'}\}_{1\leq i',j'\leq t}})$ , and hence

$$ \begin{align*}\langle f_{i},f_{j}\rangle=\| \mathbb{E}(f\vert Z_{\{H_{i,i'}+H_{j,j'}\}_{1\leq i',j'\leq t}})\|_{2}^{2},\end{align*} $$

and we have the conclusion.

By repeatedly using Proposition 3.6, we have the following inequality.

Corollary 3.7. Let $t,s\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ -system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t,$ be subgroups of ${\mathbb Z}^{d}$ . Then for all $f\in L^{\infty }(\mu )$ , we have

$$ \begin{align*}(\mathbb{E}_{i\in I}\Vert \mathbb{E}(f\vert Z_{H_{i,1},\ldots,H_{i,t}})\Vert^{2}_{2})^{ 2^s} \!\leq \Vert f\Vert^{2\cdot 2^s-2}_{2}\cdot \mathbb{E}_{i_{1},\ldots,i_{2^s}\in I}\big\Vert \mathbb{E}\big(f\big| Z_{\{\sum_{j=1}^{2^s}H_{i_{j},i^{\prime}_{j}}\}_{1\leq i^{\prime}_{1},\ldots,i^{\prime}_{2^s}\leq t}}\big)\big\Vert^{2}_{2}.\end{align*} $$

The next proposition provides an upper bound for $\mathbb {E}_{i\in I}\lvert \hspace{-1pt}\lvert \hspace{-1pt}\lvert f\rvert \hspace{-1pt}\rvert \hspace{-1pt}\rvert _{H_{i,1},\ldots ,H_{i,t}}$ which can be combined with the previous two statements.

Proposition 3.8. Let $t\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ -system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t$ be subgroups of ${\mathbb Z}^{d}$ . Then, for all $f\in L^{\infty }(\mu )$ , with $\Vert f\Vert _{L^{\infty }(\mu )}\leq 1$ ,

$$ \begin{align*}\mathbb{E}_{i\in I}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}}\leq (\mathbb{E}_{i\in I}\Vert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\Vert^{2}_{2})^{1/2^{t}}.\end{align*} $$

Proof. Note that

(11) $$ \begin{align} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}}\leq \Vert f\Vert_{L^{2^{t}}(\mu)}\leq \Vert f\Vert_{2}^{1/2^{t-1}}. \end{align} $$

Also, for all i, we have

$$ \begin{align*} \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}}&\leq \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f-\mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}} + \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}}\\ &= \lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}}, \end{align*} $$


(12) $$ \begin{align} &\mathbb{E}_{i\in I}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}} \leq \mathbb{E}_{i\in I}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}}) \rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{H_{i,1},\ldots,H_{i,t}} \nonumber\\ &\quad\leq \mathbb{E}_{i\in I}\Vert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\Vert_{2}^{1/2^{t-1}} \leq (\mathbb{E}_{i\in I}\Vert \mathbb{E}(f|Z_{H_{i,1},\ldots,H_{i,t}})\Vert^{2}_{2})^{1/2^{t}}, \end{align} $$

as was to be shown.

3.3 General properties of subgroups of ${\mathbb Z}^d$ and properties of polynomials

Recall that for a subset A of $\mathbb {Q}^{d}$ , we denote $G(A):= \text {span}_{{\mathbb Q}} \{a\in A\}\cap {\mathbb Z}^{d}.$ Next, we summarize some properties of these sets.

Lemma 3.9. The following properties hold.

  1. (i) For any set $A\subseteq {\mathbb Z}^d$ , $G(A)$ is a subgroup of ${\mathbb Z}^d$ .

  2. (ii) Let $A\subseteq \mathbb {Q}^d$ be a finite set and $M(A)$ the matrix whose columns are the elements of A. Then $G(A)=(M(A)\cdot {\mathbb Q}^{|A|})\cap {\mathbb Z}^d$ .

  3. (iii) If $H\subseteq {\mathbb Z}^d$ is the subgroup generated by $h_1,\ldots ,h_k\in {\mathbb Z}^d$ , then $G(H)=G(\{h_1,\ldots ,h_k\})$ . In particular, letting $M(h_1,\ldots ,h_k)$ be the matrix whose columns are $h_1,\ldots ,h_k$ , we have that $G(\langle h_1,\ldots ,h_k\rangle )=(M(h_1,\ldots ,h_k)\cdot {\mathbb Q}^{k}) \cap {\mathbb Z}^d$ .

  4. (iv) For any subgroup $H\subseteq {\mathbb Z}^d$ , H has finite index in $G(H)$ . Moreover, $G(H)$ is the largest subgroup of ${\mathbb Z}^d$ which is a finite index extension of H.

  5. (v) If not all of $a_1,\ldots ,a_k$ belong to a common proper subspace of ${\mathbb Q}^d$ , then $G(\{a_1,\ldots ,a_k\}) ={\mathbb Z}^d$ .

Proof. Properties (i), (ii), and (iii) follow directly from the definitions.

To prove property (iv), let $\{g_1,\ldots ,g_k\}$ be a set such that $\langle g_1,\ldots ,g_k\rangle =G(H)$ . For each $i=1,\ldots ,k$ , there exist $m_i$ and $h_i\in H$ such that $g_i={h_i}/{m_i}$ . The group $\langle m_1g_1,\ldots ,m_kg_k\rangle $ is of finite index in $\langle g_1,\ldots ,g_k\rangle =G(H)$ and is contained in H. Therefore, H is of finite index in $G(H)$ .

To see that $G(H)$ is the largest finite index extension of H, take $H'$ to be any finite index extension of H and take $h'\in H'$ . Since $H'$ is a finite index extension of H, we have that there exists $n\in {\mathbb N}$ such that $nh'\in H$ . This implies that $h'\in G(H)$ .

To show property (v), reordering $a_1,\ldots ,a_k$ if needed, we may assume that $a_1,\ldots ,a_d$ are linearly independent vectors over ${\mathbb Q}$ . It follows that $\text {span}_{{\mathbb Q}}(\{a_1,\ldots ,a_d \})={\mathbb Q}^{d}$ and then $G(\{a_1,\ldots ,a_k\})\supseteq G(\{a_1,\ldots ,a_d\})={\mathbb Z}^d$ .

Remark 3.10. If $H_1$ and $H_2$ are subgroups of ${\mathbb Z}^d$ , then $G(H_1)+G(H_2)\subseteq G(H_1+H_2)$ , with the inclusion possibly being strict. For instance, for $H_1=\langle (1,2)\rangle $ , $H_2=\langle (2,1) \rangle $ , we have that $G(H_1)=H_1$ , $G(H_2)=H_2$ , and $H_1+H_2 \subsetneq G(H_1+H_2)={\mathbb Z}^2$ . Nevertheless, Lemma 3.9 implies that that $G(H_1)+G(H_2)$ has finite index in $G(H_1+H_2)$ .

In the remainder of the section, we provide some algebraic lemmas that will be used later in the paper. For a set $E\subseteq {\mathbb Z}^d,$ we define its upper Banach density (or just upper density when there is no confusion) by If the limit exists, we say that its value is the Banach density (or just density) of E. The proof of the following lemma is routine (see also [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11] for a more general version).

Lemma 3.11. [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11]

Let $\mathbf {c}\colon (\mathbb {Z}^{L})^{s}\to \mathbb {R}$ be a polynomial. Then either $\mathbf {c}\equiv 0$ or the set of $\mathbf {h}\in (\mathbb {Z}^{L})^{s}$ such that ${\mathbf {c}}(\mathbf {h})=0$ is of (upper) Banach density $0$ .

Lemma 3.12. Let $v_{i}\in {\mathbb Z}^{L}, 1\leq i\leq k$ and U be a subset of ${\mathbb Z}^{k}$ of positive density. Then,

(13) $$ \begin{align} G\bigg(\bigg\{\sum_{1\leq i\leq k}h_{i} v_{i}\colon \mathbf{h}=(h_1,\ldots,h_k)\in U\bigg\}\bigg)=G(\{v_{i}\colon 1\leq i\leq k\}).\end{align} $$

Proof. Note that in equation (13), the right-hand side clearly includes the left-hand side. To prove the converse inclusion, it suffices to show that

(14) $$ \begin{align} \text{span}_{\mathbb{Q}}\{\mathbf{h}\colon \mathbf{h}\in U\}=\mathbb{Q}^{k}. \end{align} $$

Since U has positive density, it cannot be contained in any hyperplane of ${\mathbb Q}^k$ , so it must have at least k elements that are linearly independent over ${\mathbb Q}$ . Thus, equation (14) follows immediately.

Definition 3.13. Let $P\colon ({\mathbb Z}^{L})^D\to {\mathbb R}$ be a polynomial. Denote by $\Delta P\colon ({\mathbb Z}^{L})^{D+1}\to {\mathbb R}$ the polynomial given by for all $n,h_{1},\ldots ,h_{D}\in {\mathbb Z}^{L}$ . For a polynomial $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ , let $\Delta ^0 P=P,$ and for $K>1, \Delta ^{K}P\colon ({\mathbb Z}^{L})^{D+K}\to \mathbb {R}$ is (where $\Delta $ acts K times).

Lemma 3.14. Let $K\in {\mathbb N}$ and $Q\colon {\mathbb Z}^{L}\to {\mathbb R}$ be a homogeneous polynomial with $\deg (Q)>K$ . If $Q(n)\notin \mathbb {Q}[n]$ , then the set of $(h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K}$ such that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\notin \mathbb {Q}[n]$ is of density $1$ in $(\mathbb {Z}^{L})^{K}$ .

Proof. We may write $Q(n)=\sum _{i=1}^{M}a_{i}Q_{i}(n)$ for some $M\in {\mathbb N}$ , homogeneous polynomials $Q_{1},\ldots ,Q_{M}$ in ${\mathbb Q}[n]$ of degrees $\deg (Q)$ , and real numbers $a_{1},\ldots ,a_{M}\in {\mathbb R}$ which are linearly independent over ${\mathbb Q}$ (this can be done by taking $a_1\ldots ,a_M$ to be a basis of the ${\mathbb Q}$ -span of the coefficients of Q). Since $Q(n)\notin \mathbb {Q}[n]$ , there exists some ${1\leq i\leq M}$ such that $a_{i}\notin {\mathbb Q}$ and $Q_{i}\not \equiv 0$ . Without loss of generality, assume that $i=1.$ Since ${\deg (Q_{1})>K}$ , we have that $\Delta ^{K}Q_{1}\not \equiv 0$ .

Suppose that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\kern1.3pt{\in}\kern1.3pt \mathbb {Q}[n]$ for some $(h_{1},\ldots ,h_{K})\kern1.3pt{\in}\kern1.3pt ({\mathbb Z}^{L})^{K}$ . Note that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})=\sum _{i=1}^{M}a_{i}\Delta ^{K}Q_{i}(n,h_{1},\ldots , h_{K})$ . Since each $\Delta ^{K}Q_{i}(n, h_{1},\ldots , h_{K})$ is a rational polynomial in terms of n of degree $\deg (Q)-K$ and $a_{1},\ldots ,a_{M}\in {\mathbb R}$ are linearly independent over ${\mathbb Q}$ , we must have that $\Delta ^{K}Q_{1}(\cdot ,h_{1},\ldots ,h_{K})\equiv 0$ . So if the set of $(h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K}$ such that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\in \mathbb {Q}[n]$ has positive density, then the set of $(n,h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K+1}$ such that $\Delta ^{K}Q_{1}(n,h_{1},\ldots ,h_{K})=0$ has positive density too. By [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11], $\Delta ^{K}Q_{1}\equiv 0$ , which is a contradiction. This finishes the proof.

4 PET induction

In this section, we present the method we use to reduce the complexity of the polynomial iterates, that is, PET induction (PET is an abbreviation for ‘Polynomial Exhaustion Technique’), which was first introduced in [Reference Bergelson2]. To this end, we start by recalling a variation of van der Corput’s lemma from [Reference Donoso, Koutsogiannis and Sun7] that is convenient for our study. We then continue by presenting the inductive scheme via the use of van der Corput operations.

4.1 The van der Corput lemma

The standard tool used in reducing the complexity of polynomial families of iterates is van der Corput’s lemma (also known as ‘van der Corput’s trick’). We will use the following variation of it, the proof of which can be found in [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.2].

Lemma 4.1. (van der Corput lemma)

Let $(a(n;h_1,\ldots ,h_s))_{(n;h_1,\ldots ,h_s)\in ({\mathbb Z}^{L})^{s+1}}, s\in \mathbb {N}_0,$ be a sequence bounded by $1$ in a Hilbert space $\mathcal {H}$ . Then, for all $\tau \in \mathbb {N}_0$ ,

$$ \begin{align*} &\overline{\mathbb{E}}^{\square}_{h_1,\ldots,h_s\in {\mathbb Z}^{L}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \| \mathbb{E}_{n\in I_{N}} a(n;h_1,\ldots,h_s) \|^{2\tau}\\ &\,\leq\! 4^{\tau}\overline{\mathbb{E}}^{\square}_{h_1,\ldots,h_s,h_{s+1}\in {\mathbb Z}^{L}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\!\varlimsup_{N\to \infty} \kern-1pt\vert \mathbb{E}_{n\in I_{N}}\! \langle a(n+h_{s+1};h_1,\ldots,h_s) , a(n;h_1,\ldots,h_s) \rangle \vert^{\tau}\kern-1pt. \end{align*} $$

Remark 4.2. We use this unorthodox notation to separate the variable n from the $h_i$ terms. The variable n plays a different role in our study than the $h_i$ terms.

We also provide two applications of Lemma 4.1 for later use. The first one is to get an upper bound for single averages with polynomial iterates and a polynomial exponential weight. Let and recall Definition 3.13 for the polynomial $\Delta ^{K}P$ .

Lemma 4.3. Let $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ and $p\colon {\mathbb Z}^{L}\to {\mathbb Z}^{d}$ be polynomials. For all $K\in {\mathbb N}_0$ and $\tau \in {\mathbb N}$ , there exists $C_{K,\tau }>0$ such that for every $\mathbb {Z}^{d}$ -system, $(X,\mathcal {B},\mu , (T_{g})_{g\in \mathbb {Z}^{d}}),$ and ${f\in L^{\infty }(\mu )}$ bounded by 1, we have

$$ \begin{align*} &{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\Vert\mathbb{E}_{n\in I_{N}}\exp(P(n))T_{p(n)}f\Vert_{2}^{2\tau} \\&\quad\leq C_{K,\tau}\overline{\mathbb{E}}^{\square}_{\mathbf{h}=(h_{1},\ldots,h_{K})\in({\mathbb Z}^{L})^{K}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\Vert\mathbb{E}_{n\in I_{N}}\exp(\Delta^{K}P(n,\mathbf{h}))T_{\Delta^{K}p(n,\mathbf{h})}f\Vert^{\tau}_{2}. \end{align*} $$

Proof. When $K=0$ , there is nothing to prove. We now assume that the relation holds for some $K\in {\mathbb N}_0$ and we show it for $K+1$ . Using Lemma 4.1 and the T-invariance of $\mu $ , we get

$$ \begin{align*} &\overline{\mathbb{E}}^{\square}_{\mathbf{h}=(h_1,\ldots,h_{K})\in ({\mathbb Z}^{L})^{K}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \| \mathbb{E}_{n\in I_{N}} \exp(\Delta^{K}P(n,\mathbf{h}))T_{\Delta^{K}p(n,\mathbf{h})}f \|^{2\tau}_{2}\\ &\;\kern-1pt\leq\! 4^{\kern-0.7pt\tau}\overline{\mathbb{E}}^{\square}_{\mathbf{h}=(h_1,\ldots,h_{K+1})\in ({\mathbb Z}^{L})^{K+1}}\kern-2pt{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\!\kern-2pt\varlimsup_{N\to \infty}\! \vert \mathbb{E}_{n\in I_{N}} \kern-2pt\!\int_{\kern-0.5pt X}\!\kern-1pt\exp(\Delta^{\kern-1.2pt K+1}\kern-1pt P(n,\mathbf{h})) T_{\kern-0.5pt \Delta^{\kern-0.5pt K+1}p(n,\mathbf{h})}f\!\cdot\! \overline{f}\,d\mu \vert^{\kern-1pt\tau} \\&\;\kern-1pt\leq 4^{\tau}\overline{\mathbb{E}}^{\square}_{\mathbf{h}=(h_1,\ldots,h_{K+1})\in ({\mathbb Z}^{L})^{K+1}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\!\varlimsup_{N\to \infty}\!\Vert \mathbb{E}_{n\in I_{N}} \exp(\Delta^{K+1}P(n,\mathbf{h}))T_{\Delta^{K+1}p(n,\mathbf{h})}f\Vert^{\tau}_{2}, \end{align*} $$

and hence the result (the constant that appears depends only on $\tau $ and K).

The second application of Lemma 4.1 provides an upper bound for single averages, with linear iterates and an exponential weight evaluated at a linear polynomial, on a product system. The proof is inspired by [Reference Donoso, Koutsogiannis and Sun7, Lemma 5.2] and [Reference Host and Kra19, Proposition 2.9].

Lemma 4.4. Let $(X,\mathcal {B},\mu )$ be a probability space, $k,L\in {\mathbb N}$ and $T_{i,j}, 1\leq i\leq k, 1\leq j\leq L$ be commuting measure-preserving transformations on X. Denote $S_{j}=T_{1,j}\times \cdots \times T_{k,j}$ for $1\leq j\leq L$ . Let $G_{i}$ be the group generated by $T_{i,1},\ldots ,T_{i,L}$ . Then, for any polynomial $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ of degree 1 and $f_{1},\ldots ,f_{k}\in L^{\infty }(\mu )$ bounded by 1, we have that

(15) $$ \begin{align} {\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\Vert\mathbb{E}_{n\in I_{N}}\exp(P(n))R_n f\Vert_{L^{2}(\mu^{\otimes k})}\leq 2\min_{1\leq i\leq k}\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f_{i}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert_{G_{i}^{\times 2}}, \end{align} $$

where $f=f_{1}\otimes \cdots \otimes f_{k}$ and for

Proof. Fix $1\leq i\leq k$ and let $P(n)=a\cdot n+b$ for some $a\in {\mathbb R}^{L},b\in {\mathbb R}$ . Then, by Lemma 4.1 for $\tau =2$ and $s=0$ , the fourth power of the left-hand side of equation (15) is bounded by

$$ \begin{align*} & 16\cdot\mathbb{E}^{\square}_{h\in{\mathbb Z}^{L}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\bigg\vert\int_{X^{k}}\mathbb{E}_{n\in I_{N}}\exp(P(n+h)-P(n))R_{n+h}f\cdot R_{n}\overline{f}\,d\mu^{\otimes k}\bigg\vert^{2} \\&\quad = 16\cdot\mathbb{E}^{\square}_{h\in{\mathbb Z}^{L}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\bigg\vert\int_{X^{k}}\mathbb{E}_{n\in I_{N}}\exp(a\cdot h)R_{h}f\cdot \overline{f}\,d\mu^{\otimes k}\bigg\vert^{2} \\&\quad = 16\cdot\mathbb{E}^{\square}_{h\in{\mathbb Z}^{L}}\bigg\vert\int_{X^{k}}R_{h}f\cdot \overline{f}\,d\mu^{\otimes k}\bigg\vert^{2} \\&\quad = 16\cdot\mathbb{E}^{\square}_{h=(h_{1},\ldots,h_{L})\in{\mathbb Z}^{L}}\bigg\vert\int_{X^{k}}\bigotimes_{i=1}^{k}\bigg((\prod_{j=1}^{L}T_{i,j}^{h_{j}})f_{i}\cdot \overline{f}_{i}\bigg)\,d\mu^{\otimes k}\bigg\vert^{2} \\&\quad \leq 16\cdot\mathbb{E}^{\square}_{h=(h_{1},\ldots,h_{L})\in{\mathbb Z}^{L}}\bigg\vert\int_{X}\bigg(\prod_{j=1}^{L}T_{i,j}^{h_{j}}\bigg) f_{i}\cdot \overline{f}_{i}\,d\mu\bigg\vert^{2} \\&\quad \leq 16\cdot\mathbb{E}^{\square}_{h=(h_{1},\ldots,h_{L})\in{\mathbb Z}^{L}}\bigg\vert\int_{X}\mathbb{E} \bigg((\prod_{j=1}^{L}T_{i,j}^{h_{j}})f_{i}\cdot \overline{f}_{i}\vert \mathcal{I}(G_{i})\bigg)\,d\mu\bigg\vert^{2} \\ &\quad =16\lvert\hspace{-1pt}\lvert\hspace{-1pt}\lvert f_{i}\rvert\hspace{-1pt}\rvert\hspace{-1pt}\rvert^{4}_{G_{i}^{\times 2}}, \end{align*} $$

from where the result follows.

4.2 The van der Corput operation

To review the PET induction scheme, we will follow, and slightly modify, the approach from [Reference Donoso, Koutsogiannis and Sun7]. To this end, we extend the definitions that we have already given on the polynomial families of interest (see the beginning of §2.1), taking into account that we treat the first L-tuple of variables of the polynomials differently.

Before we list the steps of the van der Corput operation, we will present the manipulations of the inner product in Lemma 4.1 in a simple example where we have three essentially distinct polynomial iterates $(p_1(n),p_2(n),p_3(n))=(n^2,2n,n),$ to show how, by repeatedly running the van der Corput trick, we get an expression of linear iterates. This will be extended to general expressions in Theorem 4.9. Here, we want to study, for bounded by $1$ functions $f_1, f_2, f_3,$ the average of the sequence $a(n)=T_1^{n^2}f_1\cdot T_2^{2n}f_2\cdot T_2^n f_3.$ Notice that we can write this sequence as a $\mathbb {Z}^2$ -action, $a(n)=T_{(n^2,0)}f_1\cdot T_{(0,2n)}f_2\cdot T_{(0,n)}f_3$ for the triple of polynomials $((n^2,0),(0,2n),(0,n)).$ Using Lemma 4.1, we have

$$ \begin{align*} &{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \| \mathbb{E}_{n\in I_{N}} a(n) \|^{2} \\ &\quad \leq 4\overline{\mathbb{E}}^{\square}_{h_1\in {\mathbb Z}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \vert \mathbb{E}_{n\in I_{N}} \langle a(n+h_{1}) , a(n) \rangle \vert\\ &\quad = 4\overline{\mathbb{E}}^{\square}_{h_1\in {\mathbb Z}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \vert \mathbb{E}_{n\in I_{N}}\int_X T_1^{(n+h_1)^2}f_1\cdot T_2^{2(n+h_1)}f_2\cdot T_2^{n+h_1}f_3\cdot T_1^{n^2}\bar{f}_1\\ &\qquad\times T_2^{2n}\bar{f}_2 \cdot T_2^{n}\bar{f}_3 \,d\mu\vert. \end{align*} $$

Using the fact that $T_2$ is measure-preserving, we compose by $T_2^{-n}$ (notice that n is the polynomial of the minimum degree in the expression) to get

$$ \begin{align*} & 4\overline{\mathbb{E}}^{\square}_{h_1\in {\mathbb Z}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \Bigg \vert \mathbb{E}_{n\in I_{N}}\int_X \bar{f}_3\cdot T_2^{h_1}f_3\cdot T_1^{n^2+2h_1 n}T_2^{-n}(T_1^{h_1^2}f_1)\\ &\quad \times T_1^{n^2}T_2^{-n}\bar{f}_1\cdot T_2^{n}(T_2^{2h_1}f_2\cdot \bar{f}_2)\;d\mu\Bigg\vert, \end{align*} $$

where we grouped the functions with the same linear terms.

Using the Cauchy–Schwarz inequality (to discard the terms that have iterates independent of n), the previous relation is bounded by

$$ \begin{align*}\overline{\mathbb{E}}^{\square}_{h_1\in {\mathbb Z}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty} \| \mathbb{E}_{n\in I_{N}}T_1^{n^2+2h_1 n}T_2^{-n}(T_1^{h_1^2}f_1)\cdot T_1^{n^2}T_2^{-n}\bar{f}_1\cdot T_2^{n}(T_2^{2h_1}f_2\cdot \bar{f}_2)\|.\end{align*} $$

Exactly because of the grouping of the terms of the same linear iterates, the resulting polynomial iterates, $(n^2+2h_1 n,-n), (n^2,-n), (0,n),$ have the property that they are non-constant and that their pairwise differences are non-constant (this will lead us below to the notion of the ‘essentially distinct’ vector-valued polynomials).

Similarly, skipping the details, using Lemma 4.1, composing with $T_2^{-n}$ (the polynomial $(0,n)$ is of minimum ‘degree’—see below for the definition of the degree of a vector-valued polynomial), the square of the previous quantity can be bounded by

$$ \begin{align*} & \overline{\mathbb{E}}^{\square}_{(h_{1},h_2)\in {\mathbb Z}^{2}}\!{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\!\kern-1pt\varlimsup_{N\to \infty}\!\| \mathbb{E}_{n\in I_{N}}\!T_1^{n^2+2(h_1+h_2)n}T_2^{-2n}(T_1^{(h_1+h_2)^2}\!T_2^{-h_2}f_1)\! \cdot T_1^{n^2+2h_1n} T_2^{-2n}(T_1^{h_1^2}\bar{f}_1) \\ & \quad\times T_1^{n^2+2h_2n}T_2^{-2n}(T_1^{h_2^2}T_2^{-h_2}\bar{f}_1) \cdot T_1^{n^2}T_2^{-2n}f_1 \|. \end{align*} $$

Note that the iterates in the previous relation are ‘essentially distinct’ for ‘almost all’ tuples $(h_{1},h_{2})\in \mathbb {Z}^{2}$ .

Analogously, using Lemma 4.1 once more, noticing that all the resulting terms in the expression will have the factor $T_1^{n^2}T_2^{-2n},$ where $(n^2,-2n)$ is the polynomial of minimum ‘degree’, we can bound, composing with the term $T_1^{-n^2}T_2^{2n}$ , the square of the previous relation by

$$ \begin{align*} & \overline{\mathbb{E}}^{\square}_{(h_{1},h_2,h_{3})\in {\mathbb Z}^{3}}{\sup_{\substack{ (I_{N})_{N\in\mathbb{N}} \\ \mathrm{ F\unicode{xf8} lner\, seq.} }}}\varlimsup_{N\to \infty}\| \mathbb{E}_{n\in I_{N}} T_1^{2(h_1+h_2+h_3)n}(T_1^{(h_1+h_2+h_3)^2}T_2^{-h_2-2h_3}f_1)\\ &\quad \times T_1^{2(h_2+h_3)n}(T_1^{(h_2+h_3)^2}T_2^{-h_2-2h_3}\bar{f}_1)\cdot T_1^{2(h_1+h_3)n}(T_1^{(h_1+h_3)^2}T_2^{-2h_3}\bar{f}_1) \cdot T_1^{2h_3 n}(T_1^{h_3^2}T_2^{-2h_3}f_1) \\ &\quad \times T_1^{2(h_1+h_2)n}(T_1^{(h_1+h_2)^2}T_2^{-h_2}\bar{f}_1) \cdot T_1^{2h_2 n}(T_1^{h_2^2}T_2^{-h_2}f_1) \cdot T_1^{2h_1 n}(T_1^{h_1^2}f_1)\;d\mu\|. \end{align*} $$

The iterates in this last relation are linear with distinct coefficients for ‘almost all’ tuples $(h_{1},h_{2},h_{3})\in \mathbb {Z}^{3}$ . So, the eighth power of the initial expression is bounded by the previous relation.

The previous example leads naturally to the following notions.

Definition 4.5. For a polynomial $p(n;h_{1},\ldots ,h_{s})\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}$ , we denote by $\deg (p)$ the degree of p with respect to n (for example, for $s=1, L=2$ , the degree of $p(n_{1},n_{2};h_{1,1},h_{1,2})=h_{1,1}h_{1,2}n_{1}^{2}+h_{1,1}^{5}n_{2}$ is 2).

For a polynomial $p(n;h_{1},\ldots ,h_{s})=(p_{1}(n;h_{1},\ldots ,h_{s}),\ldots ,p_{d}(n;h_{1},\ldots ,h_{s}))\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d},$ we let $\deg (p)=\max _{1\leq i\leq d}\deg (p_{i})$ and we say that p is non-constant if $\deg (p)>0$ (that is, some $p_i$ is a non-constant function of n), otherwise, we say that p is constant. The polynomials $q_{1},\ldots ,q_{k}\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d}$ are called essentially distinct if they are non-constant and $q_i-q_j$ is non-constant for all $i\neq j$ . Finally, for a tuple $\mathbf {q}=(q_{1},\ldots ,q_{k}),$ we let $\deg (\mathbf {q})=\max _{1\leq i\leq k}\deg (q_{i}).$ (For clarity, we use non-bold letters for vectors (of polynomials) and bold letters for vectors of vectors (of polynomials).)

Let $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ -system, $q_{1},\ldots ,q_{k}\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d}$ be polynomials, and $g_{1},\ldots , g_{k}\colon X\times (\mathbb {Z}^{L})^{s}\to \mathbb {R}$ be functions such that $g_{m}(\cdot ;h_{1},\ldots ,h_{s})$ is an $L^{\infty }(\mu )$ function bounded by $1$ for all $h_{1},\ldots ,h_{s}\in \mathbb {Z}^L, 1\leq m\leq k$ . If $\mathbf {q}=(q_{1},\ldots ,q_{k})$ and $\mathbf {g}=(g_{1},\ldots ,g_{k}),$ we say that $A=(L,s,k,\mathbf {g},\mathbf {q})$ is a PET-tuple, and for $\tau \in {\mathbb N}_0$ , we set


We define $\deg (A)=\deg (\mathbf {q})$ , and say that A is non-degenerate if $\mathbf {q}$ is a family of essentially distinct polynomials (for convenience, $\mathbf {q}$ will be called non-degenerate as well). For $1\leq m\leq k$ , the tuple A is m-standard for $f\in L^\infty (\mu )$ if $\deg (A)=\deg (q_{m})$ and $g_{m}(x;h_{1},\ldots ,h_{s})=f(x)$ for every $x,h_1,\ldots ,h_s$ . That is, f is the mth function in $\mathbf {g}$ , only depending on the first variable, and the polynomial $q_m$ that acts on f is of the highest degree. (Here, we say m-standard for f to highlight the function of interest as, after running the vdC-operation, the position of the functions in the expression we deal with changes.) The tuple A will be called semi-standard for f if there exists $1\leq m\leq k$ such that $g_{m}(x;h_{1},\ldots ,h_{s})=f(x)$ for every $x,h_1,\ldots ,h_s$ . In this case, we do not require the function f to have a specific position in $\mathbf {g}$ nor that the polynomial acting on f be of the highest degree.

As an example, for a $\mathbb {Z}$ -system $(X,\mathcal {B},\mu ,T),$ take $L=s=1, k=3, q_1(n,h)=n^3, q_2(n,h)=3n^2h, q_3(n,h)=3nh^2,$ and, for $f,g\in L^\infty (\mu ),$ let $g_1(x,h)=f(x), g_2(x,h)=g(x),$ and $g_3(x,h)=T^{h^3}f(x).$

Then, we have that A is 1-standard for $f,$ semi-standard for f and $g,$ and, for $\kappa \in \mathbb {N}_0$ ,

For each non-degenerate PET-tuple $A{\kern-1pt}={\kern-1.5pt}(L,{\kern-0.5pt}s,{\kern-0.5pt}k,{\kern-0.5pt}\mathbf {g},{\kern-0.5pt}\mathbf {q})$ and polynomial $q\colon{\kern-1.2pt} (\mathbb {Z}^{L})^{s+1}{\kern-1.2pt}\to{\kern-1.2pt} \mathbb {Z}^{d}$ , we define the vdC-operation, $\partial _{q}A$ , according to the following three steps. (Actually, the vdC-operation can be defined for any PET-tuple, not just for non-degenerate ones. Similarly, being a procedure that reduces complexity, PET induction can be applied to any family of polynomials. As the expressions of interest in this paper correspond to non-degenerate tuples, we consider only this case.)

Step 1: For all $1\leq m\leq k$ , let $g^{\ast }_{m}=g^{\ast }_{m+k}=g_{m},$ and $q^{\ast }_1,\ldots ,q^{\ast }_{2k} \colon (\mathbb {Z}^{L})^{s+2}\to \mathbb {Z}^{d}$ be the polynomials defined as

$$ \begin{align*}\displaystyle q^{\ast}_m(n;h_1,\ldots,h_{s+1})=\left\{\!\begin{array}{ll} q_m(n+h_{s+1};h_1,\ldots,h_{s})-q(n;h_1,\ldots,h_{s}), & 1\leq m\leq k,\\ q_{m-k}(n;h_1,\ldots,h_{s})\!-\!q(n;h_1,\ldots,h_{s}), & k\!+\!1\!\leq\! m\!\leq\! 2k,\end{array} \right. \end{align*} $$

that is, we subtract the polynomial q from the first k polynomials after we have shifted by $h_{s+1}$ the first L variables, and for the second k ones, we subtract q. (In practice, this q will be one of the $q_i$ terms of minimum degree.) Denote $\mathbf {q}^{\ast }=(q^{\ast }_{1},\ldots ,q^{\ast }_{2k})$ .

Step 2: We remove from $q^{\ast }_{1},\ldots ,q^{\ast }_{2k}$ the polynomials which are constant and the associated functions $g_i^\ast $ in the expression (we group all these terms together and we see the resulting term as a single constant one, in terms of n). As we already saw in the example at the beginning of this subsection, this is justified via the use of the Cauchy–Schwarz inequality and the fact that the functions $g_m$ are bounded. Then we put the non-constant ones in groups $J_{i}=\{\tilde {q}_{i,1},\ldots ,\tilde {q}_{i,t_{i}}\}, 1\leq i\leq k'$ for some $k', t_{i}\in \mathbb {N}$ such that any two polynomials are essentially distinct if and only if they belong to different groups. Next, we write $\tilde {q}_{i,j}(n;h_{1},\ldots ,h_{s+1})=\tilde {q}_{i,1}(n;h_{1},\ldots ,h_{s+1})+\tilde {p}_{i,j}(h_{1},\ldots ,h_{s+1})$ for some polynomial $\tilde {p}_{i,j}$ for all $1\leq j\leq t_{i}, 1\leq i\leq k'$ . For convenience, we also relabel what remains, as some of the initial terms may have been removed because of the grouping of the polynomials, of the $g^{\ast }_{1},\ldots , g^{\ast }_{2k}$ accordingly as