1 Introduction
1.1 Decomposition of multicorrelation sequences
The structure and limiting behavior of (averages of) multicorrelation sequences, that is, sequences of the form
where $k\in \mathbb {N}, T_1,\ldots ,T_k\colon X\to X$ are invertible and commuting (that is, $T_iT_j=T_jT_i$ for all $i,j$ ) measurepreserving transformations on a probability space $(X,\mathcal {B},\mu )$ , $f_0,\ldots ,f_k\in L^\infty (\mu )$ and $n_1,\ldots ,n_k\in \mathbb {Z},$ is a central topic in ergodic theory. (We say that T preserves $\mu $ if $\mu (T^{1}A)=\mu (A)$ for all $A\in \mathcal {B}.$ The tuple $(X,\mathcal {B},\mu ,T_1,\ldots ,T_k)$ is a (measurepreserving) system.) For $k=1$ , Herglotz–Bochner’s theorem implies that the sequence $\int _X f_0 \cdot T_1^nf_1 \, d\mu $ is given by the Fourier coefficients of some finite complex measure $\sigma $ on ${\mathbb T}$ (see [Reference Khintchine22, Reference Koopman and von Neumann23]). More specifically, decomposing $\sigma $ into the sum of its atomic part, $\sigma _a$ , and continuous part, $\sigma _c$ , we get
where $(\psi (n))$ is an almost periodic sequence (that is, there exists a compact abelian group G, a continuous function $\phi : G \rightarrow {\mathbb C}$ , and $a \in G$ such that $\psi (n)=\phi (a^n), n\in \mathbb {N}$ ) and $(\nu (n))$ is a nullsequence, that is,
More generally, after Furstenberg’s celebrated ergodic theoretic proof of Szemerédi’s theorem [Reference Furstenberg15], for a single transformation T and iterates of the form $in, 1\leq i\leq k,$ there has been a particular interest in the study of the corresponding multicorrelation sequences
For T ergodic (that is, every Tinvariant set in $\mathcal {B}$ has trivial measure in $\{0,1\}$ ), Bergelson, Host, and Kra [Reference Bergelson, Host and Kra3] showed that the sequence $(\alpha (n))$ in equation (2) admits a decomposition of the form $a(n)=\phi (n)+\nu (n),$ where $(\phi (n))$ is a uniform limit of kstep nilsequences (see §3.1 for the definition) and $(\nu (n))$ satisfies equation (1). (Note that k is the number of linear iterates that appear in equation (2).) Leibman, in [Reference Leibman28] for ergodic systems and [Reference Leibman29] for general ones, extended the result of Bergelson, Host, and Kra to polynomial iterates, meaning that in equation (2), instead of $n,\ldots , kn,$ we have $p_1(n),\ldots ,p_k(n),$ for some $p_1,\ldots ,p_k\in \mathbb {Z}[x].$
For $d\in {\mathbb N}$ , we say that a tuple $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^d})$ is a ${\mathbb Z}^d$ measurepreserving system (or a ${\mathbb Z}^d$ system) if $(X,\mathcal {B},\mu )$ is a probability space and $T_{n}\colon X\to X, n \in {\mathbb Z}^d,$ are measurepreserving transformations on X such that $T_{(0,\ldots ,0)}=\mathrm {id}$ and $T_{m}\circ T_{n}=T_{m+n}$ for all $m,n\in {\mathbb Z}^{d}$ . Notice here that we use the notation $T_n$ to stress the fact that T is a $\mathbb {Z}^d$ action. If T is generated by the $\mathbb {Z}$ actions $T_1,\ldots ,T_d$ and $p_i=(p_{i,1},\ldots ,p_{i,d}),$ we have $T_{p_i(n)}=\prod _{j=1}^d T_j^{p_{i,j}(n)}.$ It is natural to ask whether splitting results still hold for systems with commuting transformations.
Question 1.1. [Reference Koutsogiannis, Le, Moreira and Richter27, Question 2]
Let $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^d})$ be a ${\mathbb Z}^d$ system, $k\in {\mathbb N}$ , $p_{1},\ldots ,p_{k}\colon {\mathbb Z}\to {\mathbb Z}^{d}$ a family of polynomials, and $f_0,f_1,\ldots ,f_k \in L^{\infty }(\mu )$ . Under which conditions on the system can the multicorrelation sequence
be decomposed as the sum of a uniform limit of nilsequences and a nullsequence?
The extension of the aforementioned results from $\mathbb {Z}$ to $\mathbb {Z}^d$ actions is, to this day, a challenging open problem. The main issue is that the proofs of the splitting theorems crucially depend on the theory of characteristic factors via the structure theory developed by Host and Kra [Reference Host and Kra18], a tool that is unavailable in the more general $\mathbb {Z}^d$ setting. By this, we mean that while nilfactors for $\mathbb {Z}^d$ analogs of Host–Kra uniformity norms are available (this can be found, for example, in [Reference Griesmer16]), it is in general not possible to relate averages such as equation (3) to those uniformity norms in the way one does for $d=1$ . As an aside, Frantzikinakis provided a partial answer to Question 1.1 (for $d=1$ ) in [Reference Frantzikinakis and Host10] that avoided the use of characteristic factors. The answer was partial in the sense that the nullsequence part was allowed to have an $\ell ^{2}({\mathbb Z})$ error term. A similar decomposition result for general d was proven by Frantzikinakis and Host in [Reference Frantzikinakis, Host and Kra12]. (The third author showed in [Reference Koutsogiannis25] the analog to this result for integer parts, or any combination of rounding functions, of real polynomial iterates. For a refinement of this result, with the average of the error term taken along primes, see [Reference Koutsogiannis, Le, Moreira and Richter27].) From the point of view of applications, it is useful to have such splitting results for studying weighted averages, in particular for multiple commuting transformations. (It is worth mentioning that the splitting of equation (2), where the average in the null term is taken along primes, was used by Tao and Teräväinen to show the logarithmic Chowla conjecture for products of odd factors [Reference Tao and Teräväinen32].)
It was demonstrated in [Reference Donoso, Koutsogiannis and Sun7] that under finitely many ergodicity assumptions (that is, we only have to assume that some iterates, coming from a finite set, of T are ergodic), the characteristic factors (defined in §2.3) for the corresponding averages
are, as in the case of ${\mathbb Z}$ actions, rotations on nilmanifolds. (A similar result was obtained in [Reference Johnson20] under infinitely many ergodicity assumptions. Such multiple ergodic averages always have $L^2$ limits as $N\to \infty $ [Reference Walsh34].) So, it is reasonable to expect that Question 1.1 holds after postulating finitely many ergodicity assumptions (this is an open problem even in the $k=2$ case—see [Reference Frantzikinakis, Host and Kra12]).
A partial answer toward this direction was obtained in [Reference Ferré Moragues9] by the second author. Namely, [Reference Ferré Moragues9, Theorem 1.5] shows that for any system $(X,\mathcal {B},\mu , T_1,\ldots , T_k)$ with $T_i$ and $T_iT_j^{1}$ ergodic (for all i and $j \neq i$ ) and $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the sequence
can be decomposed as a sum of a uniform limit of kstep nilsequences plus a nullsequence.
For more general expressions (as in equation (3)), exploiting results from [Reference Johnson20], it is also shown in [Reference Ferré Moragues9] that if we further assume ergodicity in all directions, that is, $T_1^{a_1}\cdots T_d^{a_d}$ is ergodic for all $(a_1,\ldots ,a_d) \in {\mathbb Z}^d \setminus \{\mathbf {{0}}\}$ , then for any family of pairwise distinct polynomials $p_1,\ldots ,p_k\colon {\mathbb Z} \rightarrow {\mathbb Z}^d$ , the sequence
can be decomposed as a sum of a uniform limit of Dstep nilsequences plus a nullsequence. (Here D depends on $k, d$ and the maximum degree of the $p_i$ terms. It also has a connection to the number of van der Corput operations we have to run in the induction (see Remark 5.14 for details).) The proof of this result makes essential use of a seminorm bound estimate obtained in [Reference Johnson20], where the (infinitely many) ergodicity assumptions are reflected (see [Reference Ferré Moragues9, Theorem 1.6]).
In [Reference Donoso, Koutsogiannis and Sun7], the first, third, and fourth authors improved the seminorm bound estimates of [Reference Johnson20] by imposing only finitely many ergodic assumptions. Although the results in [Reference Donoso, Koutsogiannis and Sun7] are stronger than those in [Reference Johnson20], one cannot apply them directly to [Reference Ferré Moragues9] to improve the aforementioned results, due to the incompatibility of the methods between the two studies [Reference Donoso, Koutsogiannis and Sun7, Reference Ferré Moragues9] (see §2.3 for more details).
In this article, we extend results from [Reference Donoso, Koutsogiannis and Sun7] to obtain splitting theorems for multicorrelation sequences involving multiparameter polynomials, postulating ergodicity assumptions which are even weaker than those in [Reference Donoso, Koutsogiannis and Sun7] on the transformations that define the $\mathbb {Z}^d$ action in equation (6); for example, we will see that the sequence $\int _X f_0 \cdot T_1^{n^2}T_{2}^{n} f_1 \cdot T_3^{n^2}T_{4}^{n} f_2 \, d\mu $ admits the desired splitting if we assume that $T_1, T_3, T_1T_3^{1}$ are ergodic.
1.2 The joint ergodicity phenomenon
In his ergodic theoretic proof of Szemerédi’s theorem, Furstenberg [Reference Furstenberg15] studied the averages of the multicorrelation sequence in equation (2). In particular, a stepping stone in the proof is the special case when the transformation T is weakly mixing (that is, $T\times T$ is ergodic for $\mu \times \mu $ ), in which he showed that the averages
converge in $L^2(\mu )$ to $\prod _{i=1}^k \int _X f_i \, d\mu $ (which we will refer to as the ‘expected limit’) as $N\to \infty $ . (Throughout this paper, unless otherwise stated, all limits of measurable functions on a measurepreserving system are taken in $L^{2}$ .) It was Berend and Bergelson [Reference Berend and Bergelson1] who characterized when the average of the integrand of equation (5), that is, for multiple commuting transformations, converges to the expected limit (and this happens exactly when $T_1 \times \cdots \times T_k$ and $T_iT_j^{1}$ for all $i\neq j$ are ergodic).
Generalizing Furstenberg’s result, Bergelson showed (in [Reference Bergelson2]) that, for a weakly mixing transformation T and essentially distinct polynomials $p_{1},\ldots ,p_{k}$ (that is, $p_{i}, p_{i}p_{j}$ are nonconstant for all $1\leq i,j\leq k, i\neq j$ ),
(For T totally ergodic (that is, $T^n$ is ergodic for all $n\in \mathbb {N}$ ) and $p_1,\ldots ,p_k$ ‘independent’ integer polynomials, it is proved in [Reference Frantzikinakis and Kra14] that we have the same conclusion. This fact remains true for an ergodic T and ‘strongly independent’ realvalued polynomials iterates, $[p_1(n)],\ldots ,[p_k(n)]$ ( $[\cdot ]$ denotes the floor function), as well (see [Reference Karageorgos and Koutsogiannis21]). These last two results also follow by a recent work of Frantzikinakis, [Reference Frantzikinakis11], in which, for single $T,$ we have a plethora of joint ergodicity results for a number of classes of iterates (not just polynomial). Finally, for real variable polynomial iterates, one is referred to [Reference Koutsogiannis26].) One can think of this last result as a strong independence property of the sequences $(T^{p_{i}(n)})_{n\in \mathbb {Z}}, 1\leq i\leq k$ in the weakly mixing case. It is reasonable to expect, under additional assumptions on the system and/or the polynomial iterates, convergence, of the averages appearing in the previous relation, to the expected limit, which naturally leads to a general notion of joint ergodicity (a sequence of finite subsets $(I_{N})_{N\in \mathbb {N}}$ of $\mathbb {Z}^L$ with the property $\lim _{N\to \infty }\vert I_{N}\vert ^{1}\cdot \vert (g+I_{N})\triangle I_{N}\vert =0$ for all $g\in \mathbb {Z}^L$ is called a Følner sequence in $\mathbb {Z}^L$ ).
Definition 1.2. Let $d, k, L\in \mathbb {N}, p_1,\ldots ,p_k\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be polynomials and $(X,\mathcal {B},\mu , (T_{g})_{g\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ system. We say that the sequence of tuples $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if for every $f_{1},\ldots ,f_{k} \in L^{\infty }(\mu )$ and every Følner sequence $(I_{N})_{N\in \mathbb {N}}$ of $\mathbb {Z}^{L}$ , we have that
When $k=1$ , we also say that $(T_{p_{1}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu .$
The following conjecture was stated in [Reference Donoso, Koutsogiannis and Sun7].
Conjecture 1.3. [Reference Donoso, Koutsogiannis and Sun7, Conjecture 1.5]
Let $d,k,L\in \mathbb {N}, p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be polynomials and $(X,\mathcal {B},\mu , (T_{g})_{g\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ system. Then the following are equivalent.

(C1) $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ .

(C2) The following conditions are satisfied:

(i) $(T_{p_{i}(n)p_{j}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ ; and

(ii) $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for the product measure $\mu ^{\otimes k}$ on $X^{k}$ .

Answering a question of Bergelson, it was shown in [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.4] that for a polynomial $p:\mathbb {Z}^L\to \mathbb {Z},$ the sequence $(T_{1}^{p(n)},\ldots ,T_{k}^{p(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if and only if $((T_{1}\times \cdots \times T_{k})^{p(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ and $T_{i}T^{1}_{j}$ is ergodic for $\mu $ for all $i\neq j.$ In this paper, the strong decomposition results that we obtain allow us to deduce joint ergodicity results for a larger family of polynomials (see Theorems 2.5 and 2.9), thus addressing some additional cases in the aforementioned conjecture.
2 Main results
In this section, we state the main results of the paper and provide a number of examples to better illustrate them. We also comment on the approaches that we follow.
2.1 Splitting results
Our first main concern is to resolve the incompatibility between [Reference Donoso, Koutsogiannis and Sun7] and [Reference Ferré Moragues9], and improve the method in [Reference Donoso, Koutsogiannis and Sun7], to obtain an extension of the results in [Reference Ferré Moragues9].
Before we state our first result, we need to introduce some notation.
For $d,L\in \mathbb {N}$ , the polynomial $q=(q_1,\ldots ,q_d): \mathbb {Z}^L\to \mathbb {Z}^d$ is nonconstant if some $q_i$ is nonconstant. Here we mean that each $q_i$ is a member of $\mathbb {Q}[x_1,\ldots ,x_L]$ with ${q_i(\mathbb {Z}^L)\subseteq \mathbb {Z}.}$ The degree of q is defined as the maximum of the degrees of the $q_i$ terms.
The polynomials $p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ are called essentially distinct if they are nonconstant and $p_{i}p_{j}$ is nonconstant for all $i\neq j$ . (In general, a polynomial ${q\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}}$ has rational coefficients (that is, vectors with rational coordinates).)
For a subset A of $\mathbb {Q}^{d}$ , we denote The following subgroups of ${\mathbb Z}^{d}$ play an important role in this paper.
Definition 2.1. Let ${\mathbf {p}}=(p_{1},\ldots ,p_{k}), p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials with $p_{i}(n)=\sum _{v\in \mathbb {N}_0^{L}}b_{i,v}n^{v}$ for some $b_{i,v}\in \mathbb {Q}^{d}$ with at most finitely many $b_{i,v},v\in \mathbb {N}_0^{L}$ nonzero. (Here, we denote for ${n=(n_{1},\ldots ,n_{L})\in {\mathbb Z}^{L}}$ and $v=(v_{1},\ldots ,v_{L})\in {\mathbb N}_0^{L}$ , where $0^0:=1.$ ) For convenience, we artificially denote $p_{0}$ as the constant zero polynomial and for all $v\in {\mathbb N}_{0}^{L}$ . For $0\leq i,j\leq k$ , set and , where, for $v=(v_1,\ldots ,v_L)\in {\mathbb N}_{0}^{L},$ we write $\vert v\vert =v_1+\cdots +v_L.$
Our main result provides an affirmative answer to Question 1.1 under finitely many ergodicity assumptions on the groups $G_{i,j}(\mathbf {p})$ , which generalizes [Reference Ferré Moragues9, Theorem 1.5]. We say that the group $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ if any function $f \in L^2(\mu )$ that is $T_{a}$ invariant for all $a \in G_{i,j}(\mathbf {p})$ is constant.
The definition of a Dstep nilsequence will be given in §3.1. We say that $a\colon {\mathbb Z}^L \to {\mathbb C}$ is a nullsequence if for any Følner sequence $(I_N)_{N\in {\mathbb N}}, \lim _{N \to \infty } {1}/{I_N} \sum _{n \in I_N} a(n)^2=0.$
Theorem 2.2. (Decomposition theorem under finitely many ergodicity assumptions)
For $d,k,K,L\in \mathbb {N},$ let $\mathbf {p}=(p_{1},\ldots ,p_{k}),$ where $p_{1},\ldots ,p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ is a family of essentially distinct polynomials of degree at most $K,$ and let $(X,\mathcal {B},\mu , (T_{n})_{n\in {\mathbb Z}^{d}})$ be a ${\mathbb Z}^{d}$ system. If $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq i,j\leq k,i\neq j,$ then for all $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the multicorrelation sequence
can be decomposed as a sum of a uniform limit of Dstep nilsequences and a nullsequence, where $D\in {\mathbb N}$ is a constant depending only on $d,k,K,L.$
We refer the reader to Remark 5.14 for a further discussion on the constant D. Also, note that Theorem 2.2 goes beyond Question 1.1 as it deals with multivariable polynomial iterates (that is, $L>1$ ).
Example 2.3. It was proved in [Reference Ferré Moragues9, Theorem 1.5] that for any probability space $(X,\mathcal {B},\mu )$ and commuting transformations $T_{1},\ldots ,T_{k}$ acting on X, if $T_i$ and $T_iT_j^{1}$ are ergodic (for all i and all $j \neq i$ , respectively), then for all $f_0,\ldots ,f_k \in L^{\infty }(\mu )$ , the multicorrelation sequence
can be decomposed as a sum of a uniform limit of kstep nilsequences plus a nullsequence. While Theorem 2.2 does not specify the step D of the nilsequence, a quick argument shows that, in this case, one can indeed take $D=k$ (see Remark 6.1 for details).
The following example shows that Theorem 2.2 is stronger than [Reference Ferré Moragues9, Theorem 1.6], which deals with single variable essentially distinct polynomial iterates.
Example 2.4. Let $(X,\mathcal {B},\mu ,T_{1},\ldots ,T_{6})$ be a system with commuting transformations $T_{1},\ldots ,T_{6}$ and $f_{0},f_{1},\ldots ,f_{4}\in L^{\infty }(\mu )$ . Using [Reference Ferré Moragues9, Theorem 1.6], we have that the multicorrelation sequence
can be decomposed as the sum of a uniform limit of nilsequences and a nullsequence if $T^{a_{1}}_{1}\cdots T^{a_{6}}_{6}$ is ergodic for all $(a_{1},\ldots ,a_{6})\in {\mathbb Z}^{6}\backslash \{\mathbf {{0}}\}$ . In contrast, via Theorem 2.2, one can get the same conclusion by only assuming that $T_{1},T_{2}T_{3}^{1}, T_{4},T_{5},T_{4}T^{1}_{5}$ are ergodic. (Indeed, denoting , and $e_{i}$ the vector whose ith entry is 1 and all other entries are 0, since $\mathbf {{p}}=((n^2,n,0,0,0,0),(n^2,0,n,0,0,0)$ , $(0,0,0,n^3,0,0)$ , $(0,0,0,0,n^3,n)),$ we have that $G_{1,0}(\mathbf {p})=G_{2,0}(\mathbf {p})=G(e_{1})$ , $G_{1,3}(\mathbf {p})= G_{2,3}(\mathbf {p})=G_{3,0}(\mathbf {p})\ =\ G(e_{4})$ , $G_{1,4}(\mathbf {p})\ =\ G_{2,4}(\mathbf {p})\ =\ G_{4,0}(\mathbf {p})\ =\ G(e_{5})$ , ${G_{1,2}(\mathbf {p})\ =\ G(e_{2}e_{3})}$ , $G_{3,4}(\mathbf {p})=G(e_{4}e_{5})$ .)
2.2 Convergence to the expected limit
In [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.4], the first, third, and fourth authors proved the following case of Conjecture 1.3. If $T_{1},\ldots ,T_{k}$ are commuting transformations acting on a probability space $(X,\mathcal {B},\mu )$ , then $(T_{1}^{p(n)},\ldots ,T_{k}^{p(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ if and only if $((T_{1}\times \cdots \times T_{k})^{p(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ and $T_{i}T^{1}_{j}$ is ergodic for $\mu $ for all $i\neq j$ . In this paper, we further extend this result.
Theorem 2.5. Let $k,d,L\in {\mathbb N}$ and $\mathbf {p}=(p_{1}v_{1},\ldots , p_{k}v_{k}),$ where $p_{1},\ldots , p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}$ , $v_{1},\ldots , v_{k}\in {\mathbb Z}^{d}$ be a family of essentially distinct polynomials. Suppose that for all $1\leq i,j\leq k$ , if $\deg (p_{i})=\deg (p_{j})$ , then either $v_{i}$ and $v_{j}$ are linearly dependent over $\mathbb {Z}$ , or $p_{i}$ and $p_{j}$ are linearly dependent over $\mathbb {Z}$ (that is, there is a nontrivial linear combination of them over $\mathbb {Z}$ which equals to a constant). Let $(X,\mathcal {B},\mu , (T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ system. Then the following are equivalent.

(C1) $(T_{p_{1}(n)v_{1}},\ldots ,T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is jointly ergodic for $\mu .$

(C2’) The following subconditions hold:

(i)’ $(T_{p_{i}(n)v_{i}p_{j}(n)v_{j}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ with $\deg (p_{i})=\deg (p_{j})$ ;

(ii) $(T_{p_{1}(n)v_{1}}\times \cdots \times T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .

Moreover, condition (C2’) is equivalent to

(C2) The following subconditions hold:

(i) $(T_{p_{i}(n)v_{i}p_{j}(n)v_{j}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu $ for all $1\leq i,j\leq k, i\neq j$ ;

(ii) $(T_{p_{1}(n)v_{1}}\times \cdots \times T_{p_{k}(n)v_{k}})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .

Note that the subconditions in condition (C2) are consistent with those in Conjecture 1.3. However, the reason we provide an alternative set of equivalent subconditions in condition (C2’) is that these subconditions are easier to check in practice.
We now give some examples to illustrate Theorem 2.5. The first one is for polynomials of distinct degrees.
Example 2.6. Let $(X,\mathcal {B},\mu ,T_{1},\ldots ,T_{k})$ be a system. Using Theorem 2.5, we conclude that $(T^{n}_{1},T^{n^{2}}_{2},\ldots ,T^{n^{k}}_{k})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n}_{1}\times \cdots \times T^{n^{k}}_{k})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes k}$ , and all the $T_{i}$ terms are ergodic for $\mu $ .
We remark that Example 2.6 can also be proved by using arguments from [Reference Chu, Frantzikinakis and Host6]. We next present two examples in which some polynomials have the same degree and so cannot be recovered by the methods of [Reference Chu, Frantzikinakis and Host6].
Example 2.7. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3,T_{4})$ be a system. Theorem 2.5 implies that $(T^{n}_{1},T^{n}_{2},T^{n^{2}}_{3}, T^{n^{2}}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n}_{1}\times T^{n}_{2}\times T^{n^{2}}_{3}\times T^{n^{2}}_{4})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes 4}$ , and both $T_{1}T^{1}_{2}$ and $((T_{3}T^{1}_{4})^{n^2})_{n\in {\mathbb N}}$ are ergodic for $\mu $ .
Example 2.8. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3)$ be a system. Theorem 2.5 implies that $(T^{n^{4}+n^{2}}_{1}$ , $T^{2n^{4}+3n}_{1}, T^{2n^{2}+2n+1}_{2}, T^{3n^{2}+3n}_{3})_{n\in {\mathbb Z}}$ is jointly ergodic if and only if $(T^{n^{4}+n^{2}}_{1}\times T^{2n^{4}+3n}_{1}\times T^{2n^{2}+2n+1}_{2}\times T^{3n^{2}+3n}_{3})_{n\in {\mathbb Z}}$ is ergodic for $\mu ^{\otimes 4}$ , and both sequences $(T^{n^{4}+n^{2}3n}_{1})_{n\in {\mathbb Z}}$ and $((T^{2}_{2}T_{3}^{3})^{n^{2}+n})_{n\in {\mathbb Z}}$ are ergodic for $\mu $ .
Another direction for the joint ergodicity problem is verifying whether condition (C1) implies condition (C2) in Conjecture 1.3. Namely, assume that $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ to find a condition, say (P), of certain sequences of actions to be ergodic, under which we have that $(T_{p_{1}(n)},\ldots ,T_{p_{k}(n)})_{n\in \mathbb {Z}^{L}}$ is jointly ergodic for $\mu $ . By combining existing results from [Reference Host and Kra18, Reference Johnson20] (see also [Reference Donoso, Koutsogiannis and Sun7, Proposition 1.2]), (P) can be taken to be ‘ $T_{g}$ is ergodic for $\mu $ for all $g\in {\mathbb Z}^{d}\backslash \{\mathbf {{0}}\}$ ’. Denoting $p_{i}(n)=\sum _{v\in {\mathbb N}_0^{L},0\leq \vert v\vert \leq K}b_{i,v}n^{v}$ for some $b_{i,v}\in \mathbb {Q}^{d}$ and $K\in {\mathbb N}_0$ , this result was extended in [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.3], where the previous property is replaced by ‘ $T_{g}$ is ergodic for $\mu $ for all g that belongs to the finite set R’, where
In this paper, we replace the latter condition with an even weaker one.
Theorem 2.9. Let $d,k,L\in \mathbb {N}, \mathbf {p}=(p_{1},\ldots , p_{k}), p_{1},\ldots , p_{k}\colon \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials and $(X,\ \mathcal {B},\ \mu ,\ (T_{g})_{g\in \mathbb {Z}^{d}})$ a $\mathbb {Z}^{d}$ system. Then, $(T_{p_{1}(n)},\ldots , T_{p_{k}(n)})_{n\in {\mathbb Z}^{L}}$ is jointly ergodic for $\mu $ if both of the following conditions hold:

(i) $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq i,j\leq k,i\neq j$ ;

(ii) $(T_{p_{1}(n)}\times \cdots \times T_{p_{k}(n)})_{n\in {\mathbb Z}^{L}}$ is ergodic for $\mu ^{\otimes k}$ .
The last example for this section reflects the stronger nature of the previous theorem compared to what was previously known.
Example 2.10. Let $(X,\mathcal {B},\mu ,T_{1},T_2,T_3,T_{4})$ be a system. Then, [Reference Donoso, Koutsogiannis and Sun7, Theorem 1.3] implies that $(T^{n^2}_{1} T^{n}_{2},T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if $((T^{n^2}_{1} T^n_2)\times (T^{n^2}_3 T^n_{4}))_{n\in \mathbb {Z}}$ is ergodic for $\mu ^{\otimes 2}$ , and all $T_{1},T_{2},T_{3},T_{4},T_{1}T_{3}^{1},T_{2}T_{4}^{1}$ are ergodic for $\mu $ . Using Theorem 2.9, we conclude that $(T^{n^2}_{1} T^{n}_{2}, T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ is jointly ergodic if we instead only assume that $((T^{n^2}_{1} T^n_2)\times (T^{n^2}_3 T^n_{4}))_{n\in \mathbb {Z}}$ is ergodic for $\mu ^{\otimes 2}$ , and all $T_{1},T_{3},T_{1}T_{3}^{1}$ are ergodic for $\mu $ .
Unfortunately, Theorem 2.9 does not imply Conjecture 1.3 for the pair $(T^{n^2}_{1} T^{n}_{2}, T^{n^2}_{3} T^{n}_{4})_{n\in {\mathbb Z}}$ . This is because $T_{1},T_{3},T_{1}T_{3}^{1}$ being ergodic for $\mu $ is independent of $((T_{1}T_{3}^{1})^{n^2} (T_{2}T_{4}^{1})^{n})_{n\in {\mathbb Z}}$ being ergodic for $\mu $ . For example, if $T_{1}=T_{3}=T_{4}=\mathrm {id}$ and $T_{2}$ is any ergodic transformation, then $((T_{1}T_{3}^{1})^{n^2} (T_{2}T_{4}^{1})^{n})_{n\in {\mathbb Z}}$ is ergodic for $\mu $ but $T_{1},T_{3},T_{1}T_{3}^{1}$ are not. However, if $X=\{0,\ldots ,6\}$ with $\mu (\{i\})=1/7$ , $T_{1}x:=x+1\ \mod 7$ , $T_{3}=T_{1}^{2}$ , and $T_{2}=T_{4}=\mathrm {id}$ , then $T_{1},T_{3},T_{1}T_{3}^{1}$ are ergodic for $\mu $ but $((T_{1}T_{3}^{1})^{n^2} (T_{2}T_{4}^{1})^{n})_{n\in {\mathbb Z}}$ is not.
2.3 Strategy of the paper
The central ingredient in proving the main results of the paper (Theorems 2.2, 2.5, and 2.9) is to find proper characteristic factors for the limit of the average in equation (4), that is, sub $\sigma $ algebras $\mathcal {D}_{1},\ldots ,\mathcal {D}_{k}$ of $\mathcal {B}$ such that the average in equation (4) remains invariant if we replace each $f_{i}$ by its conditional expectation (see below for the definition) with respect to $\mathcal {D}_{i}$ . An important type of characteristic factor, called the Host–Kra characteristic factor, was invented in [Reference Host and Kra18] to study multiple averages for ${\mathbb Z}$ systems (see below for the definition of these factors). This concept was generalized to systems with commuting transformations in [Reference Host17] (see also [Reference Sun31]).
To introduce the main tool used in our results (Theorem 2.11), special cases of which have been studied extensively in the past (see for example [Reference Chu, Frantzikinakis and Host6, Reference Frantzikinakis and Kra14, Reference Host17, Reference Host and Kra18, Reference Johnson20]), we need to introduce the machinery of Host–Kra seminorms and factors.
Host–Kra seminorms and their associated factors are arguably the main tools used to analyze the behavior of multiple averages and correlation sequences. In what follows, we give general results about these seminorms and factors, following the notation used in [Reference Donoso, Koutsogiannis and Sun7].
We first recall the notions of a factor and of the conditional expectation with respect to a factor. We say that the ${\mathbb Z}^d$ system $(Y,\mathcal {D},\nu ,(S_{g})_{g\in {\mathbb Z}^d})$ is a factor of $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ if there exists a measurable map $\pi \colon X\to Y$ such that $\mu (\pi ^{1}(A))=\nu (A)$ for all $A\in \mathcal {D}$ , and $\pi \circ T_{g}=S_{g}\circ \pi $ for all $g\in {\mathbb Z}^{d}$ .
A factor $(Y,\mathcal {D},\nu ,(S_{g})_{g\in {\mathbb Z}^d})$ of $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ can be identified with an invariant sub $\sigma $ algebra $\mathcal {B}'$ of $\mathcal {B}$ by setting . Given two $\sigma $ algebras, $\mathcal {B}_1$ and $\mathcal {B}_2$ , their joining $\mathcal {B}_1\vee \mathcal {B}_2$ is the $\sigma $ algebra generated by $B_1\cap B_2$ for all $B_1\in \mathcal {B}_1$ and $B_2\in \mathcal {B}_2$ , that is, the smallest $\sigma $ algebra containing both $\mathcal {B}_1$ and $\mathcal {B}_2$ .
Given a factor $\pi \colon (X,\mathcal {B},\mu ) \to (Y,\mathcal {D},\nu )$ and a function $f\in L^{2}(\mu )$ , the conditional expectation of f with respect to Y is the function $g\in L^{2}(\nu ),$ which we denote by $\mathbb {E}(f \mid Y),$ with the property
Let $(X,\mathcal {B},\mu )$ be a probability space and let $\mathcal {B}_1$ be a sub $\sigma $ algebra of $\mathcal {B}$ . The relatively independent joining of $(X,\mathcal {B},\mu )$ with itself with respect to $\mathcal {B}_1$ is the probability space $(X\times X, \mathcal {B} \otimes \mathcal {B}, \mu \times _{\mathcal {B}_1} \mu )$ , where the measure $\mu \times _{\mathcal {B}_1} \mu $ is given by the formula:
for all $f_1, f_{2} \in L^{\infty }(\mu )$ .
For a Gsystem $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{g})_{g\in G})$ , if H is a subgroup of $G,$ we denote by $\mathcal {I}(H)(\mathbf {X})$ the set of $A\in \mathcal {B}$ such that $T_{g}A=A$ for all $g\in H$ . When there is no confusion, we write $\mathcal {I}(H)$ .
For a ${\mathbb Z}^d$ system $(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ and $H_1,\ldots ,H_k$ subgroups of ${\mathbb Z}^d,$ define
and for $k>1$ , let
where $H^{[k1]}_{k}$ denotes the subgroup of $({\mathbb Z}^{d})^{2^{k1}}$ consisting of all the elements of the form $(h_{k},\ldots , h_{k})$ ( $2^{k1}$ copies of $h_{k}$ ) for some $h_{k}\in H_{k}$ . The characteristic factor $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})$ is defined to be the sub $\sigma $ algebra of $\mathcal {B}$ characterized by
for all $f \in L^{\infty }(\mu )$ , where $X^{[k]}=X\times \cdots \times X$ ( $2^k$ copies of X), $\vert \epsilon \vert =\epsilon _{1}+\cdots +\epsilon _{k}$ for $\epsilon =(\epsilon _{1},\ldots ,\epsilon _{k})\in \{0,1\}^{k}$ , and $\mathcal {C}^{2r+1}f=\overline {f},$ the complex conjugate of f, $\mathcal {C}^{2r}f=f$ for all $r\in {\mathbb Z}$ . The quantity $\lvert \hspace{1pt}\lvert \hspace{1pt}\lvert f\rvert \hspace{1pt}\rvert \hspace{1pt}\rvert _{H_1,\ldots ,H_k}$ denotes the Host–Kra seminorm of f with respect to the subgroups $H_1,\ldots ,H_k$ . Similar to the proof of [Reference Host17, Lemma 4] or [Reference Host and Kra18, Lemma 4.3], one can show that $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})$ is well defined.
Theorem 2.11. Let $d,k,K,L\in \mathbb {N}, \mathbf {p}=(p_{1},\ldots , p_{k}), p_{1},\ldots , p_{k}\in \mathbb {Z}^{L}\to \mathbb {Z}^{d}$ be a family of essentially distinct polynomials of degrees at most $K.$ There exists $D\in \mathbb {N}_0$ depending only on $d,k,K,L$ such that for every $\mathbb {Z}^{d}$ system $\mathbf {X}=(X,\mathcal {B},\mu , (T_{n})_{n\in \mathbb {Z}^{d}})$ , every $f_{1},\ldots , f_{k}\in L^{\infty }(\mu )$ , and every Følner sequence $(I_{N})_{N\in {\mathbb N}}$ of ${\mathbb Z}^{L}$ , if $f_{i}$ is orthogonal to the Host–Kra characteristic factor ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ for some $1\leq i\leq k$ (that is, the conditional expectation of $f_{i}$ under ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ is $0$ ), then we have that
In particular, if for some $1\leq i\leq k$ , $G_{i,j}(\mathbf {p})$ is ergodic for $\mu $ for all $0\leq j\leq k, j\neq i$ and $f_{i}$ is orthogonal to the Host–Kra characteristic factor ${Z}_{({\mathbb Z}^{d})^{\times kD}}(\mathbf {X})$ , then equation (10) holds.
It is worth noting that the factor ${Z}_{\{G_{i,j}(\mathbf {p})\}^{\times D}_{0\leq j\leq k, j\neq i}}(\mathbf {X})$ we obtain in Theorem 2.11 is not optimal, but it is good enough for our purposes.
A special case of Theorem 2.11 was proved in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1]. In particular, Theorem 2.11 generalizes [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] in the following ways.

(I) The characteristic factor obtained in Theorem 2.11 is of finite step, whereas that in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] is of infinite step.

(II) The groups $G_{i,j}(\mathbf {p})$ involved in Theorem 2.11 are larger than those in [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1], which makes the characteristic factors in Theorem 2.11 smaller.
We remark that the aforementioned technical distinctions have significant influences on the applications of Theorem 2.11. First, the essential reason why one cannot directly use [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] to improve [Reference Ferré Moragues9, Theorem 1.5] is that the method used in [Reference Ferré Moragues9] requires a characteristic factor of finite step. This problem is resolved by generalization (I), enabling us to extend [Reference Ferré Moragues9, Theorem 1.5] in this paper. Second, [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] does not provide a strong enough characteristic factor in certain circumstances. For example, in the case of Example 2.6, [Reference Chu, Frantzikinakis and Host6, Theorem 6.5] suggests that the Host–Kra seminorms controlling equation (10) depend only on the transformations $T_{1},\ldots ,T_{k}$ , whereas the upper bound provided by [Reference Donoso, Koutsogiannis and Sun7, Theorem 5.1] depends not only on the transformations $T_{1},\ldots ,T_{k}$ but also on many compositions of them. With the help of generalizations (I) and (II), we are able to obtain (and generalize) the aforementioned upper bound of [Reference Chu, Frantzikinakis and Host6, Theorem 6.5].
Roughly speaking, the achievement of generalization (I) relies on a sophisticated development of a Besseltype inequality first obtained by Tao and Ziegler in [Reference Tao and Ziegler33, Proposition 3.6]. The most technical part of this paper is the approach we use to get generalization (II). In [Reference Donoso, Koutsogiannis and Sun7], a method was introduced to keep track of the coefficients of the polynomials while running a variation of the polynomial exhaustion technique (PET) induction. However, the tracking provided there is not strong enough to imply Theorem 2.11. To overcome this difficulty, we introduce more sophisticated machinery to have a better control of the coefficients.
The paper is organized as follows. We provide some background material in §3. In §4, we present the variation of PET induction that we use. In §5, we address how generalizations (I) and (II) above can be achieved with Propositions 5.2 and 5.4, which improve Propositions 5.6 and 5.5 of [Reference Donoso, Koutsogiannis and Sun7], respectively. We conclude the section by proving Theorem 2.11. This is the bulk of the paper. In §6, we use Theorem 2.11 to deduce Theorems 2.2, 2.5, and 2.9, which are the main results of the paper. We conclude with some discussions on future directions in §7.
2.4 Notation
We denote by $\mathbb {N}, \mathbb {N}_0, \mathbb {Z}, \mathbb {Q}, \mathbb {R}$ , and $\mathbb {C}$ the sets of positive integers, nonnegative integers, integers, rational numbers, real numbers, and complex numbers, respectively. If X is a set and $d\in \mathbb {N}$ , $X^d$ denotes the Cartesian product $X\times \cdots \times X$ of d copies of X.
We will denote by $e_i$ the vector that has $1$ as its ith coordinate and $0$ elsewhere. We use in general lowercase letters to symbolize both numbers and vectors but bold letters to symbolize vectors of vectors to highlight this exact fact. The only exception to this convention is the vector $\mathbf {0}$ (that is, the vector with coordinates only $0$ ) which we always symbolize in bold.
Throughout this article, we use the following notation for averages. Let $(a(n))_{n\in {\mathbb Z}^L}$ be a sequence of complex numbers, or a sequence of measurable functions on a probability space $(X,\mathcal {B},\mu )$ . We let:
(we use the symbol $\square $ to highlight the fact that the averages are taken along the boxes $[N,N]^{L}$ );
and
It is worth noticing that if the limit $\lim _{N\to \infty } \mathbb {E}_{n\in I_N} a(n)$ exists for all Følner sequences (in ${\mathbb Z}^L$ ), then this limit does not depend on the chosen Følner sequence.
We also consider iterated averages. Let $(a(h_{1},\ldots ,h_{s}))_{h_{1},\ldots ,h_{s}\in {\mathbb Z}^L}$ be a multiparameter sequence. We let
and adopt similar conventions for $\mathbb {E}_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ , ${\overline {\mathbb {E}}}^{\square }_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ , and $\mathbb {E}^{\square }_{h_{1},\ldots ,h_{s}\in \mathbb {Z}^{L}}$ .
We end this section by recalling the notion of a system indexed by a countable abelian group $(G,+)$ . We say that a tuple $(X,\mathcal {B},\mu ,(T_{g})_{g\in G})$ is a Gmeasurepreserving system (or a Gsystem) if $(X,\mathcal {B},\mu )$ is a probability space and $T_{g}\colon X\to X$ are measurable, measurepreserving transformations on X such that $T_{e_{G}}=\mathrm {id}$ ( $e_G$ is the identity element of G) and $T_{g}\circ T_{h}=T_{g+h}$ for all $g,h\in G$ . A Gsystem will be called ergodic if for any $A\in \mathcal {B}$ such that $T_{g}A=A$ for all $g\in G$ , we have that $\mu (A)\in \{0,1\}$ . In this paper, we are mostly concerned about ${\mathbb Z}^{d}$ systems and $L^2(\mu )$ norm limits of (multiple) ergodic averages. For the corresponding norm, when it is clear from the context, we will write $\Vert {\cdot} \Vert _2$ instead of $\Vert {\cdot} \Vert _{L^2(\mu )}$ .
3 Background material
In this section, we recall some background material and prove some intermediate results that will be used later throughout the paper.
We summarize some basic properties of the Host–Kra seminorms and their associated factors.
Proposition 3.1. [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.4]
Let $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{g})_{g\in {\mathbb Z}^d})$ be a ${\mathbb Z}^{d}$ system, $H_{1},\ldots ,H_{k},H'$ be subgroups of ${\mathbb Z}^{d}$ and $f\in L^{\infty }(\mu )$ .

(i) For every permutation $\sigma \colon \{1,\ldots ,k\}\to \{1,\ldots ,k\}$ , we have that
$$ \begin{align*} Z_{H_{1},\ldots,H_{k}}(\mathbf{X})=Z_{H_{\sigma(1)},\ldots,H_{\sigma(k)}}(\mathbf{X}), \end{align*} $$and hence the corresponding seminorm does not depend on the particular order taken for the subgroups $H_1,\ldots ,H_k.$

(ii) If $\mathcal {I}(H_{j})=\mathcal {I}(H')$ , then $Z_{H_{1},\ldots ,H_{j},\ldots ,H_{k}}(\mathbf {X})=Z_{H_{1},\ldots ,H_{j1},H',H_{j+1},\ldots ,H_{k}}(\mathbf {X})$ .

(iii) For $k\geq 2$ , we have that
$$ \begin{align*}\lvert\hspace{1pt}\lvert\hspace{1pt}\lvert f\rvert\hspace{1pt}\rvert\hspace{1pt}\rvert^{2^{k}}_{H_{1},\ldots,H_{k}} =\mathbb{E}_{g\in H_{k}}\lvert\hspace{1pt}\lvert\hspace{1pt}\lvert f\cdot T_{g}\overline{f}\rvert\hspace{1pt}\rvert\hspace{1pt}\rvert^{2^{k1}}_{H_{1},\ldots,H_{k1}},\end{align*} $$while for $k=1,$
$$ \begin{align*}\lvert\hspace{1pt}\lvert\hspace{1pt}\lvert f\rvert\hspace{1pt}\rvert\hspace{1pt}\rvert^{2}_{H_{1}} =\mathbb{E}_{g\in H_{1}}\int_{X} f\cdot T_{g}\overline{f}\,d\mu.\end{align*} $$ 
(iv) Let $k\geq 2$ . If $H'\leq H_{j}$ is of finite index, then
$$ \begin{align*}Z_{H_{1},\ldots,H_{j},\ldots,H_{k}}(\mathbf{X})=Z_{H_{1},\ldots,H_{j1},H',H_{j+1},\ldots,H_{k}}(\mathbf{X}).\end{align*} $$ 
(v) If $H'\leq H_{j}$ , then $Z_{H_{1},\ldots ,H_{j},\ldots ,H_{k}}(\mathbf {X})\subseteq Z_{H_{1},\ldots ,H_{j1},H',H_{j+1},\ldots ,H_{k}}(\mathbf {X})$ .

(vi) For $k\geq 2$ , $\lvert \hspace{1pt}\lvert \hspace{1pt}\lvert f\rvert \hspace{1pt}\rvert \hspace{1pt}\rvert _{H_1,\ldots ,H_{k1}}\leq \lvert \hspace{1pt}\lvert \hspace{1pt}\lvert f\rvert \hspace{1pt}\rvert \hspace{1pt}\rvert _{H_1,\ldots ,H_{k1},H_k}$ and thus
$$ \begin{align*}Z_{H_1,\ldots,H_{k1}}(\mathbf{X})\subseteq Z_{H_1,\ldots,H_{k1},H_k}(\mathbf{X}).\end{align*} $$ 
(vii) For $k\geq 1$ , if $H_1',\ldots , H_k'$ are subgroups of $ {\mathbb Z}^d$ , then
$$ \begin{align*}Z_{H_1,\ldots,H_k}(\mathbf{X}) \vee Z_{H_1',\ldots,H_k'}(\mathbf{X}) \subseteq Z_{H_1',\ldots,H_k',H_1,\ldots,H_k}(\mathbf{X}).\end{align*} $$
As an immediate corollary of Proposition 3.1(iv), we have the following corollary.
Corollary 3.2. [Reference Donoso, Koutsogiannis and Sun7, Corollary 2.5]
Let $H_{1},\ldots ,H_{k}$ be subgroups of ${\mathbb Z}^{d}$ . If the $H_{i}$ action $(T_{g})_{g\in H_{i}}$ is ergodic on $\mathbf {X}$ for all $1\leq i\leq k$ , then $Z_{H_{1},\ldots ,H_{k}}(\mathbf {X})=Z_{{\mathbb Z}^{d},\ldots ,{\mathbb Z}^{d}}(\mathbf {X})$ .
Convention 3.3. Thanks to Proposition 3.1, we may adopt a flexible and convenient notation while writing the Host–Kra characteristic factors. For example, if $A=\{H_{1},H_{2}\}^{\times 3}$ , then the notation $Z_{A,H_{3},H^{\times 2}_{4},(H_{i})_{i=5,6}}(\mathbf {X})$ refers to $Z_{H_{1},H_{1},H_{1},H_{2},H_{2},H_{2},H_{3},H_{4},H_{4},H_{5},H_{6}}(\mathbf {X})$ (note that thanks to Proposition 3.1(i), $Z_{A,H_{3},H^{\times 2}_{4},(H_{i})_{i=5,6}}(\mathbf {X})$ is well defined regardless of the ordering of A).
Recall that for a subgroup $H\subseteq {\mathbb Z}^d$ , $H^{[1]}$ denotes the subgroup $\{ (h,h)\colon h\in H\}\subseteq {\mathbb Z}^d\times ~{\mathbb Z}^d$ .
Lemma 3.4. Let $d \in {\mathbb N}$ . Let $(X,\mathcal {B},\mu ,(T_n)_{n\in {\mathbb Z}^d})$ be a ${\mathbb Z}^{d}$ system and $H_1,\ldots ,H_k,H$ be subgroups of ${\mathbb Z}^d$ . Let $f \in L^{\infty }(\mu )$ . Then,
where in the lefthand side, we consider the product space $(X\times X,\mathcal {B}\otimes \mathcal {B},\mu \times \mu , (T_{m}\times T_{n})_{(m,n)\in \mathbb {Z}^{2d}})$ .
Proof. We proceed by induction on k. For $k=1,$ using the Cauchy–Schwarz inequality, we have
where we used in the last two equalities Proposition 3.1(iii) and (i), respectively, from where we conclude the required relation by taking square roots.
Suppose that the result holds for $k1$ . By Proposition 3.1(i) and the induction hypothesis,
and the claim follows.
3.1 Nilsystems, nilsequences, and structure theorem
Let $X=N/\Gamma $ , where N is a (kstep) nilpotent Lie group and $\Gamma $ is a discrete cocompact subgroup of N. Let $\mathcal {B}$ be the Borel $\sigma $ algebra of $X, \mu $ the normalized Haar measure on $X,$ and for $n\in {\mathbb Z}^{d},$ let ${T_{n}\colon X\to X}$ with $T_{n}x=b_{n}\cdot x$ for some group homomorphism $n\mapsto b_{n}$ from ${\mathbb Z}^{d}$ to N. We say that $\mathbf {X}=(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^{d}})$ is a (kstep) ${\mathbb Z}^{d}$ nilsystem. For $k\geq 1$ , we say that $(a_{n})_{n\in {\mathbb Z}^{d}} \subseteq \mathbb {C}$ is a (kstep) ${\mathbb Z}^{d}$ nilsequence if there exist a (kstep) ${\mathbb Z}^{d}$ nilsystem $(X,\mathcal {B},\mu ,(T_{n})_{n\in {\mathbb Z}^{d}})$ , a function $F \in C(X)$ and $x\in X$ such that $a_{n}=F(T_{n}x)$ for all $n\in {\mathbb Z}^{d}$ . For $k=0$ , a $0$ step nilsequence is a constant sequence. An important reason which makes the Host–Kra characteristic factors powerful is their connection with nilsystems. The following is a slight generalization of [Reference Ziegler36, Theorem 3.7] (see [Reference Griesmer16, Lemma 4.4.3 and Theorem 4.10.1], or Proposition 3.1(ii) and [Reference Sun31, Theorem 3.7]), which is a higher dimensional version of the Host–Kra structure theorem [Reference Host and Kra18].
Theorem 3.5. Let $\mathbf {X}$ be an ergodic $\mathbb {Z}^{d}$ system. Then $Z_{(\mathbb {Z}^{d})^{\times k}}(\mathbf {X})$ is an inverse limit of $(k1)$ step $\mathbb {Z}^{d}$ nilsystems.
3.2 Bessel’s inequality
An essential difference in the study of multiple ergodic averages between ${\mathbb Z}$ systems and ${\mathbb Z}^{d}$ systems is that in the former case, one can usually bound the average by some Host–Kra seminorm of a function f appearing in the average, whereas in the latter, one can only bound the averages by an average of a family of Host–Kra seminorms of f. To overcome this difficulty, inspired by the work of Tao and Ziegler [Reference Tao and Ziegler33], in this subsection, we derive an upper bound for expressions of the form ${\overline {\mathbb {E}}}_{i\in I}\lvert \hspace{1pt}\lvert \hspace{1pt}\lvert f\rvert \hspace{1pt}\rvert \hspace{1pt}\rvert _{H_{i,1},\ldots ,H_{i,s}}$ , where I is a finite set and $H_{i,j}$ are subgroups of ${\mathbb Z}^{d}$ .
The proof of the following statement is similar to [Reference Tao and Ziegler33, Corollary 1.22].
Proposition 3.6. (Bessel’s inequality)
Let $t\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t$ be subgroups of ${\mathbb Z}^{d}$ . Then for all $f\in L^{\infty }(\mu )$ ,
Proof. For convenience, let . Then,
which, by the Cauchy–Schwarz inequality, is bounded by
By [Reference Tao and Ziegler33, Corollary 1.21], $L^{\infty }(Z_{H_{i,1},\ldots ,H_{i,t}})$ and $L^{\infty }(Z_{H_{j,1},\ldots ,H_{j,t}})$ are orthogonal on the orthogonal complement of $L^{\infty }(Z_{\{H_{i,i'}+H_{j,j'}\}_{1\leq i',j'\leq t}})$ , and hence
and we have the conclusion.
By repeatedly using Proposition 3.6, we have the following inequality.
Corollary 3.7. Let $t,s\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t,$ be subgroups of ${\mathbb Z}^{d}$ . Then for all $f\in L^{\infty }(\mu )$ , we have
The next proposition provides an upper bound for $\mathbb {E}_{i\in I}\lvert \hspace{1pt}\lvert \hspace{1pt}\lvert f\rvert \hspace{1pt}\rvert \hspace{1pt}\rvert _{H_{i,1},\ldots ,H_{i,t}}$ which can be combined with the previous two statements.
Proposition 3.8. Let $t\in {\mathbb N}$ , $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ system, I be a finite set of indices, and $H_{i,j}, i\in I, 1\leq j\leq t$ be subgroups of ${\mathbb Z}^{d}$ . Then, for all $f\in L^{\infty }(\mu )$ , with $\Vert f\Vert _{L^{\infty }(\mu )}\leq 1$ ,
Proof. Note that
Also, for all i, we have
so
as was to be shown.
3.3 General properties of subgroups of ${\mathbb Z}^d$ and properties of polynomials
Recall that for a subset A of $\mathbb {Q}^{d}$ , we denote $G(A):= \text {span}_{{\mathbb Q}} \{a\in A\}\cap {\mathbb Z}^{d}.$ Next, we summarize some properties of these sets.
Lemma 3.9. The following properties hold.

(i) For any set $A\subseteq {\mathbb Z}^d$ , $G(A)$ is a subgroup of ${\mathbb Z}^d$ .

(ii) Let $A\subseteq \mathbb {Q}^d$ be a finite set and $M(A)$ the matrix whose columns are the elements of A. Then $G(A)=(M(A)\cdot {\mathbb Q}^{A})\cap {\mathbb Z}^d$ .

(iii) If $H\subseteq {\mathbb Z}^d$ is the subgroup generated by $h_1,\ldots ,h_k\in {\mathbb Z}^d$ , then $G(H)=G(\{h_1,\ldots ,h_k\})$ . In particular, letting $M(h_1,\ldots ,h_k)$ be the matrix whose columns are $h_1,\ldots ,h_k$ , we have that $G(\langle h_1,\ldots ,h_k\rangle )=(M(h_1,\ldots ,h_k)\cdot {\mathbb Q}^{k}) \cap {\mathbb Z}^d$ .

(iv) For any subgroup $H\subseteq {\mathbb Z}^d$ , H has finite index in $G(H)$ . Moreover, $G(H)$ is the largest subgroup of ${\mathbb Z}^d$ which is a finite index extension of H.

(v) If not all of $a_1,\ldots ,a_k$ belong to a common proper subspace of ${\mathbb Q}^d$ , then $G(\{a_1,\ldots ,a_k\}) ={\mathbb Z}^d$ .
Proof. Properties (i), (ii), and (iii) follow directly from the definitions.
To prove property (iv), let $\{g_1,\ldots ,g_k\}$ be a set such that $\langle g_1,\ldots ,g_k\rangle =G(H)$ . For each $i=1,\ldots ,k$ , there exist $m_i$ and $h_i\in H$ such that $g_i={h_i}/{m_i}$ . The group $\langle m_1g_1,\ldots ,m_kg_k\rangle $ is of finite index in $\langle g_1,\ldots ,g_k\rangle =G(H)$ and is contained in H. Therefore, H is of finite index in $G(H)$ .
To see that $G(H)$ is the largest finite index extension of H, take $H'$ to be any finite index extension of H and take $h'\in H'$ . Since $H'$ is a finite index extension of H, we have that there exists $n\in {\mathbb N}$ such that $nh'\in H$ . This implies that $h'\in G(H)$ .
To show property (v), reordering $a_1,\ldots ,a_k$ if needed, we may assume that $a_1,\ldots ,a_d$ are linearly independent vectors over ${\mathbb Q}$ . It follows that $\text {span}_{{\mathbb Q}}(\{a_1,\ldots ,a_d \})={\mathbb Q}^{d}$ and then $G(\{a_1,\ldots ,a_k\})\supseteq G(\{a_1,\ldots ,a_d\})={\mathbb Z}^d$ .
Remark 3.10. If $H_1$ and $H_2$ are subgroups of ${\mathbb Z}^d$ , then $G(H_1)+G(H_2)\subseteq G(H_1+H_2)$ , with the inclusion possibly being strict. For instance, for $H_1=\langle (1,2)\rangle $ , $H_2=\langle (2,1) \rangle $ , we have that $G(H_1)=H_1$ , $G(H_2)=H_2$ , and $H_1+H_2 \subsetneq G(H_1+H_2)={\mathbb Z}^2$ . Nevertheless, Lemma 3.9 implies that that $G(H_1)+G(H_2)$ has finite index in $G(H_1+H_2)$ .
In the remainder of the section, we provide some algebraic lemmas that will be used later in the paper. For a set $E\subseteq {\mathbb Z}^d,$ we define its upper Banach density (or just upper density when there is no confusion) by If the limit exists, we say that its value is the Banach density (or just density) of E. The proof of the following lemma is routine (see also [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11] for a more general version).
Lemma 3.11. [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11]
Let $\mathbf {c}\colon (\mathbb {Z}^{L})^{s}\to \mathbb {R}$ be a polynomial. Then either $\mathbf {c}\equiv 0$ or the set of $\mathbf {h}\in (\mathbb {Z}^{L})^{s}$ such that ${\mathbf {c}}(\mathbf {h})=0$ is of (upper) Banach density $0$ .
Lemma 3.12. Let $v_{i}\in {\mathbb Z}^{L}, 1\leq i\leq k$ and U be a subset of ${\mathbb Z}^{k}$ of positive density. Then,
Proof. Note that in equation (13), the righthand side clearly includes the lefthand side. To prove the converse inclusion, it suffices to show that
Since U has positive density, it cannot be contained in any hyperplane of ${\mathbb Q}^k$ , so it must have at least k elements that are linearly independent over ${\mathbb Q}$ . Thus, equation (14) follows immediately.
Definition 3.13. Let $P\colon ({\mathbb Z}^{L})^D\to {\mathbb R}$ be a polynomial. Denote by $\Delta P\colon ({\mathbb Z}^{L})^{D+1}\to {\mathbb R}$ the polynomial given by for all $n,h_{1},\ldots ,h_{D}\in {\mathbb Z}^{L}$ . For a polynomial $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ , let $\Delta ^0 P=P,$ and for $K>1, \Delta ^{K}P\colon ({\mathbb Z}^{L})^{D+K}\to \mathbb {R}$ is (where $\Delta $ acts K times).
Lemma 3.14. Let $K\in {\mathbb N}$ and $Q\colon {\mathbb Z}^{L}\to {\mathbb R}$ be a homogeneous polynomial with $\deg (Q)>K$ . If $Q(n)\notin \mathbb {Q}[n]$ , then the set of $(h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K}$ such that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\notin \mathbb {Q}[n]$ is of density $1$ in $(\mathbb {Z}^{L})^{K}$ .
Proof. We may write $Q(n)=\sum _{i=1}^{M}a_{i}Q_{i}(n)$ for some $M\in {\mathbb N}$ , homogeneous polynomials $Q_{1},\ldots ,Q_{M}$ in ${\mathbb Q}[n]$ of degrees $\deg (Q)$ , and real numbers $a_{1},\ldots ,a_{M}\in {\mathbb R}$ which are linearly independent over ${\mathbb Q}$ (this can be done by taking $a_1\ldots ,a_M$ to be a basis of the ${\mathbb Q}$ span of the coefficients of Q). Since $Q(n)\notin \mathbb {Q}[n]$ , there exists some ${1\leq i\leq M}$ such that $a_{i}\notin {\mathbb Q}$ and $Q_{i}\not \equiv 0$ . Without loss of generality, assume that $i=1.$ Since ${\deg (Q_{1})>K}$ , we have that $\Delta ^{K}Q_{1}\not \equiv 0$ .
Suppose that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\kern1.3pt{\in}\kern1.3pt \mathbb {Q}[n]$ for some $(h_{1},\ldots ,h_{K})\kern1.3pt{\in}\kern1.3pt ({\mathbb Z}^{L})^{K}$ . Note that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})=\sum _{i=1}^{M}a_{i}\Delta ^{K}Q_{i}(n,h_{1},\ldots , h_{K})$ . Since each $\Delta ^{K}Q_{i}(n, h_{1},\ldots , h_{K})$ is a rational polynomial in terms of n of degree $\deg (Q)K$ and $a_{1},\ldots ,a_{M}\in {\mathbb R}$ are linearly independent over ${\mathbb Q}$ , we must have that $\Delta ^{K}Q_{1}(\cdot ,h_{1},\ldots ,h_{K})\equiv 0$ . So if the set of $(h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K}$ such that $\Delta ^{K}Q(n,h_{1},\ldots ,h_{K})\in \mathbb {Q}[n]$ has positive density, then the set of $(n,h_{1},\ldots ,h_{K})\in ({\mathbb Z}^{L})^{K+1}$ such that $\Delta ^{K}Q_{1}(n,h_{1},\ldots ,h_{K})=0$ has positive density too. By [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.11], $\Delta ^{K}Q_{1}\equiv 0$ , which is a contradiction. This finishes the proof.
4 PET induction
In this section, we present the method we use to reduce the complexity of the polynomial iterates, that is, PET induction (PET is an abbreviation for ‘Polynomial Exhaustion Technique’), which was first introduced in [Reference Bergelson2]. To this end, we start by recalling a variation of van der Corput’s lemma from [Reference Donoso, Koutsogiannis and Sun7] that is convenient for our study. We then continue by presenting the inductive scheme via the use of van der Corput operations.
4.1 The van der Corput lemma
The standard tool used in reducing the complexity of polynomial families of iterates is van der Corput’s lemma (also known as ‘van der Corput’s trick’). We will use the following variation of it, the proof of which can be found in [Reference Donoso, Koutsogiannis and Sun7, Lemma 2.2].
Lemma 4.1. (van der Corput lemma)
Let $(a(n;h_1,\ldots ,h_s))_{(n;h_1,\ldots ,h_s)\in ({\mathbb Z}^{L})^{s+1}}, s\in \mathbb {N}_0,$ be a sequence bounded by $1$ in a Hilbert space $\mathcal {H}$ . Then, for all $\tau \in \mathbb {N}_0$ ,
Remark 4.2. We use this unorthodox notation to separate the variable n from the $h_i$ terms. The variable n plays a different role in our study than the $h_i$ terms.
We also provide two applications of Lemma 4.1 for later use. The first one is to get an upper bound for single averages with polynomial iterates and a polynomial exponential weight. Let and recall Definition 3.13 for the polynomial $\Delta ^{K}P$ .
Lemma 4.3. Let $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ and $p\colon {\mathbb Z}^{L}\to {\mathbb Z}^{d}$ be polynomials. For all $K\in {\mathbb N}_0$ and $\tau \in {\mathbb N}$ , there exists $C_{K,\tau }>0$ such that for every $\mathbb {Z}^{d}$ system, $(X,\mathcal {B},\mu , (T_{g})_{g\in \mathbb {Z}^{d}}),$ and ${f\in L^{\infty }(\mu )}$ bounded by 1, we have
Proof. When $K=0$ , there is nothing to prove. We now assume that the relation holds for some $K\in {\mathbb N}_0$ and we show it for $K+1$ . Using Lemma 4.1 and the Tinvariance of $\mu $ , we get
and hence the result (the constant that appears depends only on $\tau $ and K).
The second application of Lemma 4.1 provides an upper bound for single averages, with linear iterates and an exponential weight evaluated at a linear polynomial, on a product system. The proof is inspired by [Reference Donoso, Koutsogiannis and Sun7, Lemma 5.2] and [Reference Host and Kra19, Proposition 2.9].
Lemma 4.4. Let $(X,\mathcal {B},\mu )$ be a probability space, $k,L\in {\mathbb N}$ and $T_{i,j}, 1\leq i\leq k, 1\leq j\leq L$ be commuting measurepreserving transformations on X. Denote $S_{j}=T_{1,j}\times \cdots \times T_{k,j}$ for $1\leq j\leq L$ . Let $G_{i}$ be the group generated by $T_{i,1},\ldots ,T_{i,L}$ . Then, for any polynomial $P\colon {\mathbb Z}^{L}\to {\mathbb R}$ of degree 1 and $f_{1},\ldots ,f_{k}\in L^{\infty }(\mu )$ bounded by 1, we have that
where $f=f_{1}\otimes \cdots \otimes f_{k}$ and for
Proof. Fix $1\leq i\leq k$ and let $P(n)=a\cdot n+b$ for some $a\in {\mathbb R}^{L},b\in {\mathbb R}$ . Then, by Lemma 4.1 for $\tau =2$ and $s=0$ , the fourth power of the lefthand side of equation (15) is bounded by
from where the result follows.
4.2 The van der Corput operation
To review the PET induction scheme, we will follow, and slightly modify, the approach from [Reference Donoso, Koutsogiannis and Sun7]. To this end, we extend the definitions that we have already given on the polynomial families of interest (see the beginning of §2.1), taking into account that we treat the first Ltuple of variables of the polynomials differently.
Before we list the steps of the van der Corput operation, we will present the manipulations of the inner product in Lemma 4.1 in a simple example where we have three essentially distinct polynomial iterates $(p_1(n),p_2(n),p_3(n))=(n^2,2n,n),$ to show how, by repeatedly running the van der Corput trick, we get an expression of linear iterates. This will be extended to general expressions in Theorem 4.9. Here, we want to study, for bounded by $1$ functions $f_1, f_2, f_3,$ the average of the sequence $a(n)=T_1^{n^2}f_1\cdot T_2^{2n}f_2\cdot T_2^n f_3.$ Notice that we can write this sequence as a $\mathbb {Z}^2$ action, $a(n)=T_{(n^2,0)}f_1\cdot T_{(0,2n)}f_2\cdot T_{(0,n)}f_3$ for the triple of polynomials $((n^2,0),(0,2n),(0,n)).$ Using Lemma 4.1, we have
Using the fact that $T_2$ is measurepreserving, we compose by $T_2^{n}$ (notice that n is the polynomial of the minimum degree in the expression) to get
where we grouped the functions with the same linear terms.
Using the Cauchy–Schwarz inequality (to discard the terms that have iterates independent of n), the previous relation is bounded by
Exactly because of the grouping of the terms of the same linear iterates, the resulting polynomial iterates, $(n^2+2h_1 n,n), (n^2,n), (0,n),$ have the property that they are nonconstant and that their pairwise differences are nonconstant (this will lead us below to the notion of the ‘essentially distinct’ vectorvalued polynomials).
Similarly, skipping the details, using Lemma 4.1, composing with $T_2^{n}$ (the polynomial $(0,n)$ is of minimum ‘degree’—see below for the definition of the degree of a vectorvalued polynomial), the square of the previous quantity can be bounded by
Note that the iterates in the previous relation are ‘essentially distinct’ for ‘almost all’ tuples $(h_{1},h_{2})\in \mathbb {Z}^{2}$ .
Analogously, using Lemma 4.1 once more, noticing that all the resulting terms in the expression will have the factor $T_1^{n^2}T_2^{2n},$ where $(n^2,2n)$ is the polynomial of minimum ‘degree’, we can bound, composing with the term $T_1^{n^2}T_2^{2n}$ , the square of the previous relation by
The iterates in this last relation are linear with distinct coefficients for ‘almost all’ tuples $(h_{1},h_{2},h_{3})\in \mathbb {Z}^{3}$ . So, the eighth power of the initial expression is bounded by the previous relation.
The previous example leads naturally to the following notions.
Definition 4.5. For a polynomial $p(n;h_{1},\ldots ,h_{s})\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}$ , we denote by $\deg (p)$ the degree of p with respect to n (for example, for $s=1, L=2$ , the degree of $p(n_{1},n_{2};h_{1,1},h_{1,2})=h_{1,1}h_{1,2}n_{1}^{2}+h_{1,1}^{5}n_{2}$ is 2).
For a polynomial $p(n;h_{1},\ldots ,h_{s})=(p_{1}(n;h_{1},\ldots ,h_{s}),\ldots ,p_{d}(n;h_{1},\ldots ,h_{s}))\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d},$ we let $\deg (p)=\max _{1\leq i\leq d}\deg (p_{i})$ and we say that p is nonconstant if $\deg (p)>0$ (that is, some $p_i$ is a nonconstant function of n), otherwise, we say that p is constant. The polynomials $q_{1},\ldots ,q_{k}\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d}$ are called essentially distinct if they are nonconstant and $q_iq_j$ is nonconstant for all $i\neq j$ . Finally, for a tuple $\mathbf {q}=(q_{1},\ldots ,q_{k}),$ we let $\deg (\mathbf {q})=\max _{1\leq i\leq k}\deg (q_{i}).$ (For clarity, we use nonbold letters for vectors (of polynomials) and bold letters for vectors of vectors (of polynomials).)
Let $(X,\mathcal {B},\mu ,(T_{g})_{g\in \mathbb {Z}^{d}})$ be a $\mathbb {Z}^{d}$ system, $q_{1},\ldots ,q_{k}\colon (\mathbb {Z}^{L})^{s+1}\to \mathbb {Z}^{d}$ be polynomials, and $g_{1},\ldots , g_{k}\colon X\times (\mathbb {Z}^{L})^{s}\to \mathbb {R}$ be functions such that $g_{m}(\cdot ;h_{1},\ldots ,h_{s})$ is an $L^{\infty }(\mu )$ function bounded by $1$ for all $h_{1},\ldots ,h_{s}\in \mathbb {Z}^L, 1\leq m\leq k$ . If $\mathbf {q}=(q_{1},\ldots ,q_{k})$ and $\mathbf {g}=(g_{1},\ldots ,g_{k}),$ we say that $A=(L,s,k,\mathbf {g},\mathbf {q})$ is a PETtuple, and for $\tau \in {\mathbb N}_0$ , we set
We define $\deg (A)=\deg (\mathbf {q})$ , and say that A is nondegenerate if $\mathbf {q}$ is a family of essentially distinct polynomials (for convenience, $\mathbf {q}$ will be called nondegenerate as well). For $1\leq m\leq k$ , the tuple A is mstandard for $f\in L^\infty (\mu )$ if $\deg (A)=\deg (q_{m})$ and $g_{m}(x;h_{1},\ldots ,h_{s})=f(x)$ for every $x,h_1,\ldots ,h_s$ . That is, f is the mth function in $\mathbf {g}$ , only depending on the first variable, and the polynomial $q_m$ that acts on f is of the highest degree. (Here, we say mstandard for f to highlight the function of interest as, after running the vdCoperation, the position of the functions in the expression we deal with changes.) The tuple A will be called semistandard for f if there exists $1\leq m\leq k$ such that $g_{m}(x;h_{1},\ldots ,h_{s})=f(x)$ for every $x,h_1,\ldots ,h_s$ . In this case, we do not require the function f to have a specific position in $\mathbf {g}$ nor that the polynomial acting on f be of the highest degree.
As an example, for a $\mathbb {Z}$ system $(X,\mathcal {B},\mu ,T),$ take $L=s=1, k=3, q_1(n,h)=n^3, q_2(n,h)=3n^2h, q_3(n,h)=3nh^2,$ and, for $f,g\in L^\infty (\mu ),$ let $g_1(x,h)=f(x), g_2(x,h)=g(x),$ and $g_3(x,h)=T^{h^3}f(x).$
Then, we have that A is 1standard for $f,$ semistandard for f and $g,$ and, for $\kappa \in \mathbb {N}_0$ ,
For each nondegenerate PETtuple $A{\kern1pt}={\kern1.5pt}(L,{\kern0.5pt}s,{\kern0.5pt}k,{\kern0.5pt}\mathbf {g},{\kern0.5pt}\mathbf {q})$ and polynomial $q\colon{\kern1.2pt} (\mathbb {Z}^{L})^{s+1}{\kern1.2pt}\to{\kern1.2pt} \mathbb {Z}^{d}$ , we define the vdCoperation, $\partial _{q}A$ , according to the following three steps. (Actually, the vdCoperation can be defined for any PETtuple, not just for nondegenerate ones. Similarly, being a procedure that reduces complexity, PET induction can be applied to any family of polynomials. As the expressions of interest in this paper correspond to nondegenerate tuples, we consider only this case.)
Step 1: For all $1\leq m\leq k$ , let $g^{\ast }_{m}=g^{\ast }_{m+k}=g_{m},$ and $q^{\ast }_1,\ldots ,q^{\ast }_{2k} \colon (\mathbb {Z}^{L})^{s+2}\to \mathbb {Z}^{d}$ be the polynomials defined as
that is, we subtract the polynomial q from the first k polynomials after we have shifted by $h_{s+1}$ the first L variables, and for the second k ones, we subtract q. (In practice, this q will be one of the $q_i$ terms of minimum degree.) Denote $\mathbf {q}^{\ast }=(q^{\ast }_{1},\ldots ,q^{\ast }_{2k})$ .
Step 2: We remove from $q^{\ast }_{1},\ldots ,q^{\ast }_{2k}$ the polynomials which are constant and the associated functions $g_i^\ast $ in the expression (we group all these terms together and we see the resulting term as a single constant one, in terms of n). As we already saw in the example at the beginning of this subsection, this is justified via the use of the Cauchy–Schwarz inequality and the fact that the functions $g_m$ are bounded. Then we put the nonconstant ones in groups $J_{i}=\{\tilde {q}_{i,1},\ldots ,\tilde {q}_{i,t_{i}}\}, 1\leq i\leq k'$ for some $k', t_{i}\in \mathbb {N}$ such that any two polynomials are essentially distinct if and only if they belong to different groups. Next, we write $\tilde {q}_{i,j}(n;h_{1},\ldots ,h_{s+1})=\tilde {q}_{i,1}(n;h_{1},\ldots ,h_{s+1})+\tilde {p}_{i,j}(h_{1},\ldots ,h_{s+1})$ for some polynomial $\tilde {p}_{i,j}$ for all $1\leq j\leq t_{i}, 1\leq i\leq k'$ . For convenience, we also relabel what remains, as some of the initial terms may have been removed because of the grouping of the polynomials, of the $g^{\ast }_{1},\ldots , g^{\ast }_{2k}$ accordingly as