Exponential control of the trajectories of iterated function systems with application to semi-strong GARCH models

Baye Matar Kandji

doi:10.1017/jpr.2023.13

Exponential control of the trajectories of iterated function systems with application to semi-strong GARCH $\boldsymbol{{(P, Q)}}$ models

Part of: Applications Stochastic analysis Inference from stochastic processes

Published online by Cambridge University Press: 15 May 2023

Baye Matar Kandji

Show author details

Baye Matar Kandji*: Affiliation:
CREST, ENSAE, Institut Polytechnique de Paris
*: *Postal address: 5 Avenue Henri Le Chatelier, 91120 Palaiseau, France. Email: bayematar.kandji@ensae.fr

Article contents

Abstract
Introduction
Stochastic IFS without moments
Proof of the main result
Inference for semi-strong GARCH(p, q)
Funding information
Competing interests
References

Rights & Permissions

Abstract

We establish new results on the strictly stationary solution to an iterated function system. When the driving sequence is stationary and ergodic, though not independent, the strictly stationary solution may admit no moment but we show an exponential control of the trajectories. We exploit these results to prove, under mild conditions, the consistency of the quasi-maximum likelihood estimator of GARCH(p,q) models with non-independent innovations.

Keywords

Inference without moments quasi-maximum likelihood semi-strong GARCH stochastic recurrence equation

MSC classification

Primary: 60H25: Random operators and equations

Secondary: 62M10: Time series, auto-correlation, regression, etc. 62P05: Applications to actuarial sciences and financial mathematics

Type: Original Article
Information: Journal of Applied Probability , Volume 60 , Issue 4 , December 2023 , pp. 1501 - 1515

DOI: https://doi.org/10.1017/jpr.2023.13 [Opens in a new window]
Copyright: © The Author(s), 2023. Published by Cambridge University Press on behalf of Applied Probability Trust

1. Introduction

Since [Reference Kesten19], the theoretical properties of the stochastic recurrence equation (SRE) $\boldsymbol{X}_{t}=\boldsymbol{A}_{t} \boldsymbol{X}_{t-1}+\boldsymbol{B}_{t}$ has received much attention. This equation gathers a large class of classical econometric processes such as the GARCH and ARMA models, and their numerous variants. A sufficient condition of existence and uniqueness of a strictly stationary solution was proposed in [Reference Brandt5] in the case where $(\boldsymbol{A}_t,\boldsymbol{B}_t)_t$ is stationary and ergodic. Under an irreducibility condition, [Reference Bougerol and Picard4] established that this condition is also necessary when the sequence $(\boldsymbol{A}_{t},\boldsymbol{B}_{t})$ is independent and identically distributed (i.i.d.). The probabilistic properties of the stationary solution of SRE model in the i.i.d. case are well known. In the scalar case, [Reference Kesten19] showed that $\mathbb{P}(\pm \boldsymbol{X}_1>x) \sim c_{\pm} x^{-a}$ as $x \rightarrow \infty$ for some positive constants $c_{\pm}$ . A thorough study of SRE models, in particular their tail behavior, is presented in [Reference Buraczewski, Damek and Mikosch6]. The SRE model is the affine-mapping-particular case of the so-called stochastic iterated function system (IFS) $\boldsymbol{X}_{t}=\Psi(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1})$ . Most of the theoretical properties established for SRE models (stationary, tail properties) can be extended to IFS equations.

One important application of SREs in time series analysis is the study of the stationarity properties of GARCH processes. Assuming i.i.d. innovations, [Reference Bougerol and Picard3] deduced from [Reference Brandt5] a necessary and sufficient condition for the existence of a unique stationary solution of a general GARCH(p, q) model. In recent years, the i.i.d. assumption on the innovations has often been replaced by a less restrictive conditional moment assumption (the model is then called ‘semi-strong’ GARCH). See [Reference Escanciano10] for the classical GARCH(p,q) model, and [Reference Francq and Thieu12, Reference Han and Kristensen17] for GARCH-X models. The GARCH-MIDAS models of [Reference Engle, Ghysels and Sohn9] constitute another class of IFS models which are not driven by an i.i.d. sequence. Another example is given by GARCH-X models which are IFS driven by a (generally non-i.i.d.) sequence of innovations and covariates. This motivates studying IFS equations driven by non-i.i.d. innovations.

However, strict stationarity generally does not suffice for establishing the asymptotic properties of estimators, such as the quasi-maximum likelihood estimator (QMLE). To our knowledge, all existing works on the QML inference of IFS models assume the existence of a small-order moment of the observed process. Surprisingly, however, the strictly stationary solutions of IFS equations with non-i.i.d. innovations may not admit any finite moment.

The aim of this paper is to establish that the stationary trajectories of the IFS equations enjoy an exponential control property. We also show that this property is sufficient to establish the consistency of the QMLE of semi-strong GARCH models.

The rest of the paper is organized as follows. In Section 2 we present our main result, and Section 3 is devoted to its proof. Section 4 investigates the estimation of the semi-strong GARCH(p, q) model. Complementary proofs are displayed in the Appendices.

2. Stochastic IFS without moments

Let $(E, \mathcal{E})$ be a measurable space and (F, d) a complete and separable metric space (Polish space). Let $(\boldsymbol{\theta}_{t})_{t \in \mathbb{Z}}$ be a stationary and ergodic process valued in E, and let $\Psi\colon E \times F \rightarrow F$ be a function such that $x \mapsto \Psi(\theta, x)$ is Lipschitz continuous for all $\theta \in E$ . Let

\begin{equation*}\boldsymbol{\Lambda}_{t}=\Lambda(\boldsymbol{\Psi}_{t}) =\sup _{x_{1}, x_{2} \in F, x_{1} \neq x_{2}}\frac{d(\boldsymbol{\Psi}_{t}( x_{1}),\boldsymbol{\Psi}_{t}( x_{2}))}{d(x_{1},x_{2})},\end{equation*}

where $\boldsymbol{\Psi}_{t}=\Psi(\boldsymbol{\theta}_t,\cdot)$ . Let $\boldsymbol{\Lambda}_{t}^{(0)}=1$ and $\boldsymbol{\Lambda}_{t}^{(r)} = \Lambda(\boldsymbol{\Psi}_{t}\circ\cdots\circ\boldsymbol{\Psi}_{t-r+1})$ for all $r>0$ .

Consider the IFS

(1)

\begin{equation} \boldsymbol{X}_{t} = \Psi(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1}) = \boldsymbol{\Psi}_{t}(\boldsymbol{X}_{t-1}) \qquad \text{for all } t \in \mathbb{Z}.\end{equation}

A solution $(\boldsymbol{X}_{t})$ of (1) is said to be causal if, for every t, $\boldsymbol{X}_{t}$ is $\sigma(\boldsymbol{\theta}_{k},\,k\leq t)$ -measurable.

Under a slightly different form, the following result has been established in [Reference Elton8, Theorem 3] and [Reference Bougerol2, Theorem 3.1]; see also [Reference Straumann and Mikosch22, Theorem 2.8] and the review in [Reference Diaconis and Freedman7].

Theorem 1. Assume the following conditions hold: (i) there exists a constant $c \in F$ such that $\mathbb{E} \ln ^{+}d(\boldsymbol{\Psi}_{0}( c),c) < \infty$ ; (ii) $\mathbb{E} \ln ^{+} \boldsymbol{\Lambda}_{0}<\infty$ ; and (iii) $\lim_{r\rightarrow\infty}({1}/{r})\ln \boldsymbol{\Lambda}_{0}^{(r)} < 0$ almost surely (a.s.). Then there exists a unique stationary (causal and ergodic) solution $(\boldsymbol{X}_{t})_{t \in \mathbb{Z}}$ to (1).

Moreover,

(2)

\begin{equation} \textit{for all } t\in\mathbb{Z}, \quad d(\boldsymbol{X}_{t},c) \leq \sum_{n=0}^{\infty} \boldsymbol{\Lambda}_{t}^{(n)}d(\boldsymbol{\Psi}_{t-n} (c),c) < \infty\quad a.s. \end{equation}

Note that $(\ln\boldsymbol{\Lambda}_{0}^{(r)})_{r\geq1}$ is a sub-additive sequence. Therefore, by the sub-additive ergodic theorem of [Reference Kingman20], the limit in assumption (iii) exists.

For the reader’s convenience and because we have not been able to find (2) exactly under this form, we provide a proof of Theorem 1 in Appendix A.

Remark 1. If $(\boldsymbol{\theta}_t)$ is i.i.d., it is possible to prove in particular cases, including the affine mapping, that $d(\boldsymbol{X}_{1},c)$ has a power-law tail [Reference Buraczewski, Damek and Mikosch6, Theorem 5.3.6]. More generally, it can be shown that, under the conditions of Theorem 1, there exists $s>0$ such that $\mathbb{E}d(\boldsymbol{X}_{1},c)^s < \infty$ . This small moment property is often used in the statistical inference of IFS models, for example, to prove the consistency of GARCH models and their derivatives (see [Reference Berkes, Horváth and Kokoszka1] for the GARCH model and [Reference Francq, Wintenberger and Zakoian13] for the EGARCH and Log-GARCH models). If $(\boldsymbol{\theta}_t)$ is not i.i.d., the examples below show that the stationary solution may not admit any small-order moment.

Example 1. Let $\delta\in(0,1)$ and let $(\boldsymbol{z}_t)_{t\in\mathbb{Z}}$ be an i.i.d. non-negative real process with $\mathbb{E}\boldsymbol{z}_t=\frac12(1-\delta)$ and $\mathbb{E}\boldsymbol{z}_t^2=\infty$ . The process $(\boldsymbol{\theta}_t)$ defined by $\boldsymbol{\theta}_t=\sum_{k=0}^{\infty}\delta^k \boldsymbol{z}_{t-k}$ for all $t\in\mathbb{Z}$ satisfies $\mathbb{E}\boldsymbol{\theta}_t=\frac{1}{2}$ and is such that, for all $t\in\mathbb{Z}$ , $\boldsymbol{x}_t=1+\sum_{k=1}^{\infty}\prod_{j=1}^{k}\boldsymbol{\theta}_{t-j+1}$ exists a.s. Moreover, $(\boldsymbol{x}_t)$ is the unique stationary solution of $\boldsymbol{x}_t=\boldsymbol{\theta}_t \boldsymbol{x}_{t-1}+1$ , $t\in \mathbb{Z}$ . Note that $\boldsymbol{x}_t\geq\prod_{j=1}^{k}\boldsymbol{\theta}_{t-j+1} \geq \delta^{{k(k-1)}/{2}}(\boldsymbol{z}_{t-k+1})^k$ for all $k\in\mathbb{N}^*$ . For all $s>0$ , we thus have $\mathbb{E}\boldsymbol{x}_0^s\geq\mathbb{E}\delta^{{sk(k-1)}/{2}}(\boldsymbol{z}_{0})^{sk}=\infty$ for k such that $sk>2$ .

The previous example is simple, but probably a little artificial. We now give an example of commonly used econometric models, for which it was recently proven that the strictly stationary solution does not admit any finite moment.

Example 2. Consider the following GARCH-MIDAS model [Reference Engle, Ghysels and Sohn9]:

\begin{equation*} \begin{cases} \displaystyle \boldsymbol{r}_{t}=\sqrt{\boldsymbol{\tau}_{t}} \boldsymbol{\sigma}_{t} \boldsymbol{\eta}_{t} , \\ \displaystyle\boldsymbol{\tau}_{t}=a+b \boldsymbol{r}_{t-1}^{2} , \\ \displaystyle \boldsymbol{\sigma}_{t}^{2}=1-\alpha-\beta+\alpha{\boldsymbol{r}_{t-1}^{2}}/{\boldsymbol{\tau}_{t}} + \beta\boldsymbol{\sigma}_{t-1}^{2} , \end{cases} \end{equation*}

where $(\boldsymbol{\eta}_{t})_t$ is a zero-mean and unit-variance i.i.d. sequence, $\alpha>0$ , $\beta\geq0$ , $\alpha+\beta<1$ , $a>0$ , and $b>0$ . Noting that $\boldsymbol{\epsilon}_t:= \boldsymbol{\sigma}_{t} \boldsymbol{\eta}_{t}$ is a GARCH process, we see that $(\boldsymbol{\tau}_{t})$ follows the SRE $\boldsymbol{\tau}_{t}=a+b \boldsymbol{r}_{t-1}^{2}=a+(b \boldsymbol{\epsilon}^2_{t-1})\boldsymbol{\tau}_{t-1}$ driven by a non-i.i.d. sequence $\boldsymbol{\epsilon}_t$ . It can be shown that, when $b\leq 1$ , the process $(\boldsymbol{r}_t)$ is strictly stationary but, when $\boldsymbol{\eta}_0$ has unbounded support, then, for any $s>0$ , $E|\boldsymbol{r}_t|^s=\infty$ . See [Reference Francq, Kandji and Zakoian11, Proposition 1] for the proof of the previous result.

We now state our main result, which provides a way to circumvent the non-existence of small-order moments for models such as those of Examples 1 and 2. Section 4 will be devoted to the statistical study of a class of econometric models where the existence of moments is not guaranteed.

Theorem 2. Under the conditions of Theorem 1, for all $t\in\mathbb{Z}$ ,

\begin{equation*} \textrm{(i)} \quad \limsup\limits_{n \rightarrow \infty}({1}/{n}) \ln d(\boldsymbol{X}_{t+n},c) \leq 0 ; \qquad \textrm{(ii)} \quad \limsup\limits_{n \rightarrow \infty}({1}/{n}) \ln d(\boldsymbol{X}_{t-n},c) \leq 0 \quad a.s. \end{equation*}

Theorem 2 can be interpreted as an exponential control of the trajectory of the stationary solution. Note that the property $\mathbb{E}\ln^+d(\boldsymbol{X}_{1},c)<\infty$ (a weaker condition than the existence of a small-order moment) implies the results of Theorem 2 (see Appendix B). However, the converse is false (see [Reference Tanny23, Example (a)]).

As a consequence of the previous theorem, we obtain the following result. Its proof is provided in Appendix C.

Corollary 1. Under the conditions of Theorem 2, almost surely, $\lim_{|n|\rightarrow\infty}({1}/{|n|})\ln^+ d(\boldsymbol{X}_{t+n},c)$ exists and is equal to 0; if $\mathbb{E}\ln^-d(\boldsymbol{X}_{1},c)<\infty$ , then

(3)

\begin{equation} \lim_{|n|\rightarrow\infty} \frac{1}{|n|}\ln d(\boldsymbol{X}_{t+n},c) \textit{ exists and is equal to } 0. \end{equation}

3. Proof of the main result

To show Theorem 2, we first define an SRE which bounds the distance between $\boldsymbol{X}_{t}$ and c.

Note that, by [Reference Kingman20],

(4)

\begin{equation} \lim_{r\rightarrow\infty}\frac{1}{r}\ln \boldsymbol{\Lambda}_{0}^{(r)} = \inf_{r \in \mathbb{N}^{*}} \frac{1}{r} \mathbb{E}\ln \boldsymbol{\Lambda}_{0}^{(r)} = \lim _{r \rightarrow \infty} \frac{1}{r} \mathbb{E}\ln \boldsymbol{\Lambda}_{0}^{(r)} \quad \textrm{a.s.},\end{equation}

so by Theorem 1(iii) there exists a positive integer $r_0$ such that $\mathbb{E}\ln \boldsymbol{\Lambda}_{0}^{(r_0)}<0$ . It can be shown that $\mathbb{E}[\ln((\boldsymbol{\Lambda}_{0}^{(r_0)}+u))]\stackrel{u\downarrow0}{\longrightarrow}\mathbb{E}\ln\boldsymbol{\Lambda}_{0}^{(r_0)}$ [Reference Straumann and Mikosch22, proof of Theorem 2.10]. Therefore, there exists $u_0>0$ such that $\ln(u_0)\leq\gamma_0:=\mathbb{E}[\ln((\boldsymbol{\Lambda}_{0}^{(r_0)}+u_0))]<0$ . We thus have, for all $v \in[\gamma_0,0)$ ,

(5)

\begin{equation} \mathbb{E}[\ln(\delta(v) (\boldsymbol{\Lambda}_{0}^{(r_0)}+u_0))]=v ,\end{equation}

with $\delta(v)=\exp(v-\gamma_0)\geq 1$ .

Now, for any integer $p\in[0, r_0-1]$ , define $(\boldsymbol{a}_{p,t}(v),\boldsymbol{b}_{p,t})_{t\in\mathbb{Z}}$ by

\[ \boldsymbol{a}_{p,t}(v)=\delta(v) (\boldsymbol{\Lambda}_{r_0t+p}^{(r_0)}+u_0), \qquad \boldsymbol{b}_{p,t}=1+\sum_{k=0}^{r_0-1} \boldsymbol{\Lambda}_{r_0t+p}^{(k)}d(\boldsymbol{\Psi}_{r_0t+p-k}(c),c).\]

By Theorem 1(i) and (ii), and by the elementary inequality $\ln(\sum_{i=1}^{n} a_{i}) \leq \ln n+\sum_{i=1}^{n} \ln^+a_{i}$ for non-negative $\{a_{i}\}_{i=1}^{n}$ , we have $\mathbb{E}\ln^+\boldsymbol{a}_{p,t}(v)<\infty$ and $\mathbb{E}\ln^+\boldsymbol{b}_{p,t}(v)<\infty$ . Therefore, in view of (5), there exists a unique stationary solution $(\boldsymbol{z}_{p,t}(v))_t$ to the equation

(6)

\begin{equation} \boldsymbol{z}_{p,t}(v)=\boldsymbol{a}_{p,t}(v)\boldsymbol{z}_{p,t-1}(v)+\boldsymbol{b}_{p,t}.\end{equation}

Note that, by [Reference Brandt5],

(7)

\begin{equation} \boldsymbol{z}_{p,t}(v) = \sum_{q=0}^{\infty}\Bigg(\prod_{i=0}^{q-1} \boldsymbol{a}_{p,t-i}(v)\Bigg) \boldsymbol{b}_{p,t-q}.\end{equation}

By iterating (6), we have

(8)

\begin{equation} \boldsymbol{z}_{p,t}(v) = \sum_{q=0}^{n}\Bigg(\prod_{i=0}^{q-1} \boldsymbol{a}_{p,t-i}(v)\Bigg) \boldsymbol{b}_{p,t-q} + \Bigg(\prod_{i=0}^{n} \boldsymbol{a}_{p,t-i}(v)\Bigg)\boldsymbol{z}_{p,t-(n+1)}(v), \quad \text{for all } n \geq 1.\end{equation}

By (7) and (8), $\big(\prod_{i=0}^{n} \boldsymbol{a}_{p,t-i}(v)\big)\boldsymbol{z}_{p,t-(n+1)}(v)$ is the remainder of a convergent series, and hence almost surely converges to 0. That is,

(9)

\begin{equation} \Bigg(\prod_{k=0}^{n-1}\boldsymbol{a}_{p,t-k}(v)\Bigg) \boldsymbol{z}_{p,t-n}(v) \stackrel{n\rightarrow\infty}{\rightarrow} 0 \quad \textrm{a.s.}\end{equation}

We now give a technical lemma linking the processes $(\boldsymbol{X}_{t})$ and $(\boldsymbol{z}_{p,t}(v))_t$ .

Lemma 1. For all $v\in[\gamma_0,0)$ , $0\leq p\leq r_0-1$ , and $t\in\mathbb{Z}$ ,

(10)

\begin{equation} d(\boldsymbol{X}_{r_0t+p},c)\leq\boldsymbol{z}_{p,t}(v) \quad a.s. \end{equation}

Proof of Lemma 1. For any integer n, let q and m denote the quotient and remainder of the Euclidean division of n by $r_0$ : $n=qr_0+m$ . By sub-multiplicativity we have

$$ \boldsymbol{\Lambda}_{t}^{(n)} \leq \Bigg( \prod_{i=0}^{q-1}\boldsymbol{\Lambda}_{t-ir_0}^{(r_0)}\Bigg) \boldsymbol{\Lambda}_{t-qr_0}^{(m)}, \qquad \prod_{i=0}^{-1}\boldsymbol{\Lambda}_{t-ir_0}^{(r_0)}=1. $$

For all $q\in\mathbb{N}$ , we then obtain

$$ \sum_{n=qr_0}^{(q+1)r_0-1}\boldsymbol{\Lambda}_{t}^{(n)}d(\boldsymbol{\Psi}_{t-n} (c),c) \leq \Bigg(\prod_{i=0}^{q-1}\boldsymbol{\Lambda}_{t-ir_0}^{(r_0)}\Bigg) \sum_{m=0}^{r_0-1}\boldsymbol{\Lambda}_{t-qr_0}^{(m)}d(\boldsymbol{\Psi}_{t-qr_0-m} (c),c). $$

It follows that

\begin{align*} \sum_{n=0}^{\infty} \boldsymbol{\Lambda}_{t}^{(n)}d(\boldsymbol{\Psi}_{t-n} (c),c) & = \sum_{q=0}^{\infty}\sum_{n=qr_0}^{(q+1)r_0-1}\boldsymbol{\Lambda}_{t}^{(n)} d(\boldsymbol{\Psi}_{t-n} (c),c) \\ & \leq \sum_{q=0}^{\infty}\Bigg(\prod_{i=0}^{q-1}\boldsymbol{\Lambda}_{t-ir_0}^{(r_0)}\Bigg) \sum_{m=0}^{r_0-1}\boldsymbol{\Lambda}_{t-qr_0}^{(m)}d(\boldsymbol{\Psi}_{t-qr_0-m} (c),c). \end{align*}

Since $\delta(v)\geq1$ and $u_0>0$ , we obtain

$$ \Bigg(\prod_{i=0}^{q-1} \boldsymbol{a}_{p,t-i}(v)\Bigg) \boldsymbol{b}_{p,t-q} \geq \Bigg(\prod_{i=0}^{q-1}\boldsymbol{\Lambda}_{(r_0t+p)-ir_0}^{(r_0)}\Bigg) \sum_{m=0}^{r_0-1}\boldsymbol{\Lambda}_{(r_0t+p)-qr_0}^{(m)}d(\boldsymbol{\Psi}_{(r_0t+p)-qr_0-m} (c),c). $$

In view of the last two inequalities, together with (7) and (2), we have

$$ \boldsymbol{z}_{p,t}(v) \geq \sum_{n=0}^{\infty} \boldsymbol{\Lambda}_{r_0t+p}^{(n)}d(\boldsymbol{\Psi}_{r_0t+p-n} (c),c) \geq d(\boldsymbol{X}_{r_0t+p},c), $$

which proves (10).

Let $\textbf{Aff}$ denote the set of affine maps from $\mathbb{R}$ into $\mathbb{R}$ . An element $\boldsymbol{f}_{a,b}$ of $\textbf{Aff}$ can be written as $\boldsymbol{f}_{a,b}(x) =a x+b$ , $x \in \mathbb{R}$ , where $(a,b)\in\mathbb{R}^2$ .

Lemma 2. Let us define a function $\Phi$ from $\textbf{Aff}$ to $\mathbb{R}_+$ by $\Phi(\boldsymbol{f}_{a,b})=|a|+|b|$ .

(i) For any x with $|x| \geq 1$ , $|\boldsymbol{f}_{a,b}(x)|\leq\Phi(\boldsymbol{f}_{a,b})|x|$ .
(ii) If $|d|\geq1$ then $\Phi(\boldsymbol{f}_{a,b}\circ\boldsymbol{f}_{c,d}) \leq \Phi(\boldsymbol{f}_{a,b})\Phi(\boldsymbol{f}_{c,d})$ .

Since Lemma 2 is elementary, its proof is skipped. Note that $\Phi$ is the 1-norm in the vector space of affine maps.

Lemma 3. For all $p\in\{0,\dots, r_0-1\}$ and $t\in\mathbb{Z}$ , letting $Q_p(t)=r_0t+p$ ,

\begin{equation*} \textrm{(i)} \quad \limsup\limits_{n \rightarrow \infty} \frac{1}{n} \ln d(\boldsymbol{X}_{Q_p(t+n)},c) \leq 0; \qquad \textrm{(ii)} \limsup\limits_{n \rightarrow \infty} \frac{1}{n} \ln d(\boldsymbol{X}_{Q_p(t-n)},c) \leq 0 \quad \text{a.s.} \end{equation*}

Lemma 3 distinguishes between cases (i) and (ii) because their proofs are different.

Proof of Lemma 3. We start by proving (i). Let $\boldsymbol{f}_t$ be the random affine map defined by $\boldsymbol{f}_t(x) = \boldsymbol{a}_{p,t}(v)x + \boldsymbol{b}_{p,t}$ for all $x\in\mathbb{R}$ . Define also the maps $\boldsymbol{\gamma}_{t,n} = \boldsymbol{f}_t\circ\boldsymbol{f}_{t-1}\dotsb\circ\boldsymbol{f}_{t-n+1}$ and $\boldsymbol{\zeta}_{t,n}=\boldsymbol{f}_{t+n}\circ \boldsymbol{f}_{t+n-1}\dotsb\circ\boldsymbol{f}_{t+1}$ for all $(t,n)\in \mathbb{Z}\times\mathbb{N}^*$ . Note that

(11)

\begin{equation} \boldsymbol{\zeta}_{t,n} = \boldsymbol{\gamma}_{t+n,n}, \qquad \boldsymbol{z}_{p,t}(v) = \boldsymbol{\gamma}_{t,n}(\boldsymbol{z}_{p,t-n}(v)), \qquad \boldsymbol{z}_{p,t+n}(v) = \boldsymbol{\zeta}_{t,n}(\boldsymbol{z}_{p,t}(v)) \quad \textrm{a.s.} \end{equation}

Since $\boldsymbol{b}_{p,t}\geq1$ , by Lemma 2(ii),

(12)

\begin{equation} (\boldsymbol{u}_{t,n})_n := (\ln\Phi(\boldsymbol{\gamma}_{t,n}))_n , \qquad (\boldsymbol{w}_{t,n})_n := (\ln\Phi(\boldsymbol{\zeta}_{t,n}))_n \end{equation}

are sub-additive sequences. By arguments already used, we have $\mathbb{E}|\ln\Phi(\boldsymbol{\gamma}_{t,1})| = \mathbb{E}|\ln\Phi(\boldsymbol{\zeta}_{t,1})| = \mathbb{E}|\ln\Phi(\boldsymbol{f}_t)|<\infty$ . In view of (11) and Lemma 2(i),

\begin{equation*} \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\ln\boldsymbol{z}_{p,t+n}(v) \leq \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\boldsymbol{w}_{t,n} + \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\ln\boldsymbol{z}_{p,t}(v) \quad \textrm{a.s.} \end{equation*}

Because $\boldsymbol{z}_{p,t}(v)$ does not depend on n, we have $\limsup\limits_{n \rightarrow \infty}({1}/{n})\ln\boldsymbol{z}_{p,t}(v) = 0$ a.s. Therefore,

(13)

\begin{equation} \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\ln\boldsymbol{z}_{p,t+n}(v) \leq \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\boldsymbol{w}_{t,n} \quad \textrm{a.s.} \end{equation}

Since, for any $n\in\mathbb{N}^*$ , $\boldsymbol{u}_{t,n}$ and $\boldsymbol{w}_{t,n}$ have the same law, by (12) and Kingman’s sub-additive ergodic theorem,

(14)

\begin{equation} \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\boldsymbol{w}_{t,n} = \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\mathbb{E}\boldsymbol{u}_{t,n} = \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\boldsymbol{u}_{t,n} \quad \textrm{a.s.} \end{equation}

On the other hand, in view of (8), we have, by the positivity of the coefficients,

$$ \Phi(\boldsymbol{\gamma}_{t,n+1}) = \sum_{q=0}^{n}\Bigg(\prod_{i=0}^{q-1} \boldsymbol{a}_{p,t-i}(v)\Bigg)\boldsymbol{b}_{p,t-q} + \Bigg(\prod_{i=0}^{n} \boldsymbol{a}_{p,t-i}(v)\Bigg) \stackrel{n\rightarrow\infty}{\rightarrow} \boldsymbol{z}_{p,t}(v) \quad \textrm{a.s.} $$

Therefore, $\lim_{n \rightarrow \infty}\boldsymbol{u}_{t,n}=\ln\boldsymbol{z}_{p,t}(v)$ a.s., which entails

(15)

\begin{equation} \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\boldsymbol{u}_{t,n}=0 \quad \textrm{a.s.} \end{equation}

By (13), (14), and (15), we get $\limsup\limits_{n \rightarrow \infty}({1}/{n})\ln\boldsymbol{z}_{p,t+n}(v) \leq 0$ a.s., which implies, by (10), part (i) of the lemma.

For (ii), by (10), (9), (5), and the ergodic theorem, we have

\begin{align*} \limsup\limits_{n \rightarrow \infty} \frac{1}{n}\ln d(\boldsymbol{X}_{Q_p(t-n)},c) & \leq \limsup\limits_{n \rightarrow \infty}\frac{1}{n}\ln\boldsymbol{z}_{p,t-n}(v) \\ & \leq \limsup\limits_{n \rightarrow \infty}\frac{1}{n} \ln\Bigg(\prod_{i=0}^{n-1}\boldsymbol{a}_{p,t-i}(v)\Bigg)\boldsymbol{z}_{p,t-n}(v) \\ & \quad - \liminf\limits_{n \rightarrow \infty}\frac{1}{n} \ln\Bigg(\prod_{i=0}^{n-1}\boldsymbol{a}_{p,t-i}(v)\Bigg) \leq -v \quad \textrm{a.s.} \end{align*}

for all $v\in[\gamma_0,0)$ . Letting $v\rightarrow0^-$ , we get the result.

We are now ready to prove Theorem 2.

Proof of Theorem 2. For all $t\in\mathbb{Z}$ , let $t^\prime\in\mathbb{Z}$ and $p^\prime$ , $0\leq p^\prime\leq r_0-1$ , be such that $t=r_0t^\prime+p^\prime$ . Note that $\{t+k,k\in\mathbb{N}\}\subset\bigcup_{0\leq p\leq r_0-1}\{r_0(t^\prime+k)+p,k\in\mathbb{N}\}$ . This and Lemma 3(i) imply that

\begin{align*} \limsup\limits_{n \rightarrow \infty}\frac{1}{n} \ln d(\boldsymbol{X}_{t+n},c) & \leq \max_{0\leq p\leq r_0-1}\bigg(\limsup\limits_{n \rightarrow \infty}\frac{1}{Q_p(t^\prime+n)} \ln d(\boldsymbol{X}_{Q_p(t^\prime+n)},c)\bigg) \\ & \leq C\max_{0\leq p\leq r_0-1}\bigg(\limsup\limits_{n \rightarrow \infty}\frac{1}{n} \ln d(\boldsymbol{X}_{Q_p(t^\prime+n)},c)\bigg) \leq 0 \end{align*}

for

$$C=\max_{0\leq p\leq r_0-1}\bigg(\sup_{n\geq 0}{\frac{n}{Q_p(t^\prime+n)}}\bigg),$$

which establishes (i). Part (ii) follows from similar arguments.

4. Inference for semi-strong GARCH(p, q)

Consider the GARCH(p, q) model

(16)

\begin{equation} \boldsymbol{\epsilon}_{t}=\sqrt{\boldsymbol{h}_{t}} \boldsymbol{\eta}_{t}, \qquad \boldsymbol{h}_{t}=\omega_{0}+\sum_{i=1}^{q} \alpha_{0 i} \boldsymbol{\epsilon}_{t-i}^{2}+\sum_{j=1}^{p} \beta_{0 j} \boldsymbol{h}_{t-j}, \quad \text{for all } t \in \mathbb{Z} ,\end{equation}

where $\omega_{0} > 0$ , $\alpha_{0 i} \geqslant 0$ ( $i=1, \ldots, q$ ), and $\beta_{0 j} \geqslant 0$ ( $j=1, \ldots, p$ ). When $(\boldsymbol{\eta}_{t})$ is i.i.d., the model in (16) is a standard strong GARCH, for which the statistical inference has been thoroughly studied. In particular, [Reference Berkes, Horváth and Kokoszka1, Reference Francq and Zakoian14] studied the QMLE under the stationarity of $(\boldsymbol{\epsilon}_{t})$ , and [Reference Jensen and Rahbek18] explored the asymptotic behavior of the QMLE in the explosive case. In the stationary framework, [Reference Escanciano10] proved the consistency and asymptotic normality of the QMLE without i.i.d.-ness for $(\boldsymbol{\eta}_t)$ , but had to assume that $E|\boldsymbol{\epsilon}_t|^s<\infty$ for some small $s>0$ . The aim of this section is to relax this extra moment assumption.

4.1. Property of the strictly stationary solution

Let

$$\boldsymbol{A}_{t}= \left( \begin{array}{c@{\quad}c@{\,\,}c@{\quad}c@{\,\,}c@{\quad}c} \displaystyle\alpha_{01} \boldsymbol{\eta}_{t}^{2} & \cdots & \alpha_{0 q} \boldsymbol{\eta}_{t}^{2} & \beta_{01} \boldsymbol{\eta}_{t}^{2} &\cdots & \beta_{0 p} \boldsymbol{\eta}_{t}^{2}\\[6pt] & {I_{q-1}} & && {0_{(q-1)\times p}} \\[6pt] \alpha_{01} & \cdots & \alpha_{0 q} & \beta_{01} &\cdots & \beta_{0 p} \\[6pt] & {0_{(p-1)\times q}} & && {I_{p-1}}\\ \end{array}\right) , \qquad\boldsymbol{b}_{t}=\left(\begin{array}{c}\displaystyle\omega_{0}\boldsymbol{\eta}_{t}^{2} \\[6pt] 0_{q-1}\\[6pt] \omega_{0} \\[6pt] 0_{p-1}\end{array}\right)$$

with standard notation.

The model in (16) is a special case of (1) using $\boldsymbol{\theta}_{t}=(\boldsymbol{A}_{t},\boldsymbol{b}_t)$ , $\boldsymbol{X}_{t} = (\boldsymbol{\epsilon}_{t}^{2},\ldots,\boldsymbol{\epsilon}_{t-q+1}^{2}$ , $\boldsymbol{h}_{t}^{2},\ldots,\boldsymbol{h}_{t-p+1}^{2})'$ , $\Psi(\theta, x)=Ax+b$ , and $d(x,y)=\|x-y\|$ for any norm $\|\cdot\|$ on $\mathbb{R}^{p+q}$ . Note that $\boldsymbol{\Lambda}_{t}^{(r)}=\|\boldsymbol{A}_{t}\boldsymbol{A}_{t-1}\ldots\boldsymbol{A}_{t-r+1}\|$ .

In what follows, we do not assume that $(\boldsymbol{\eta}_t)$ is i.i.d., we only assume that it is stationary and ergodic. If $\mathbb{E}\ln^+\boldsymbol{\eta}_1^2<\infty$ , Theorem 1 applies with $c=0_{p+q}$ . Therefore, in view of (4), there exists a unique non-anticipative strictly stationary solution $(\boldsymbol{\epsilon}_t)$ to model (16) if

\begin{equation*} \gamma(\textbf{A}) := \inf_{r \in\mathbb{N}^*} \frac{1}{r} \mathbb{E}(\ln\|\boldsymbol{A}_{0}\boldsymbol{A}_{-1}\ldots\boldsymbol{A}_{-r+1}\|) = \lim_{r \rightarrow \infty}\frac{1}{r} \ln\|\boldsymbol{A}_{0}\boldsymbol{A}_{-1}\ldots\boldsymbol{A}_{-r+1}\| < 0 \quad \text{a.s.}\end{equation*}

By Theorem 2, it follows that the strictly stationary solution of (16) satisfies

(17)

\begin{equation} \limsup_{n\to\infty}\frac{1}{n}\ln \boldsymbol{\epsilon}_{t+n}^2 \leq 0, \qquad \limsup_{n\to\infty}\frac{1}{n}\ln \boldsymbol{\epsilon}_{t-n}^2\leq 0 \quad \mbox{a.s.} \end{equation}

for all $t\in \mathbb{Z}$ .

In the GARCH(1,1) case, it is easy to check that $\gamma(\textbf{A}) = \mathbb{E}\ln(\alpha_{01}\boldsymbol{\eta}_{t}^2+\beta_{01})$ . For general GARCH(p,q) of the form (16), it seems impossible to compute $\gamma(\textbf{A})$ explicitly. This issue has been discussed in several papers, e.g. [3, p. 117] and [6, pp. 148, 149]. Both papers recommend estimation by computer simulation.

4.2. QML estimator

Let $\{\boldsymbol{\epsilon}_{t}\}_{t=1}^{n}$ be a sample of size n of the unique non-anticipative strictly stationary solution of model (16). The vector of parameters $\boldsymbol{\theta} = (\boldsymbol{\theta}_{1}, \ldots, \boldsymbol{\theta}_{p+q+1})^{\top} =(\omega, \alpha_{1}, \ldots, \alpha_{q}, \beta_{1}, \ldots, \beta_{p})^{\top}$ belongs to a parameter space $\boldsymbol{\Theta} \subset\mathopen] 0,+\infty\mathclose[ \times \mathopen[0, \infty\mathclose[^{p+q}$ . The true value of the parameter is unknown and is denoted by $\boldsymbol{\theta}_{0} = (\omega_{0}, \alpha_{01}, \ldots, \alpha_{0 q}$ , $\beta_{01}, \ldots, \beta_{0 p})^{\top}$ . Conditionally on initial values $\boldsymbol{\epsilon}_{0}, \ldots, \boldsymbol{\epsilon}_{1-q}$ , $\tilde{\boldsymbol{\sigma}}_{0}^{2}, \ldots, \tilde{\boldsymbol{\sigma}}_{1-p}^{2}$ , the Gaussian quasi-likelihood is defined by

$$L_{n}(\boldsymbol{\theta}) =L_{n}(\boldsymbol{\theta}; \boldsymbol{\epsilon}_{1} \ldots,\boldsymbol{\epsilon}_{n}) =\prod_{t=1}^{n}\frac{1}{\sqrt{2 \pi \tilde{\boldsymbol{\sigma}}_{t}^{2}}\,}\exp\bigg({-}\frac{\boldsymbol{\epsilon}_{t}^{2}}{2 \tilde{\boldsymbol{\sigma}}_{t}^{2}}\bigg),$$

where the $\tilde{\boldsymbol{\sigma}}_{t}^{2}$ are defined recursively, for $t \geqslant 1$ , by

$$\tilde{\boldsymbol{\sigma}}_{t}^{2} = \tilde{\boldsymbol{\sigma}}_{t}^{2}(\boldsymbol{\theta}) =\omega + \sum_{i=1}^{q} \alpha_{i} \boldsymbol{\epsilon}_{t-i}^{2} +\sum_{j=1}^{p} \beta_{j} \tilde{\boldsymbol{\sigma}}_{t-j}^{2}.$$

For instance, the initial values can be chosen as

(18)

\begin{equation} \boldsymbol{\epsilon}_{0}^{2} = \cdots = \boldsymbol{\epsilon}_{1-q}^{2} = \tilde{\boldsymbol{\sigma}}_{0}^{2} = \cdots = \tilde{\boldsymbol{\sigma}}_{1-p}^{2} = c,\end{equation}

with $c=\omega$ or $\boldsymbol{\epsilon}_{1}^{2}$ . The standard estimator of the GARCH parameter $\boldsymbol{\theta}_0$ is the QMLE defined as any measurable solution $\hat{\boldsymbol{\boldsymbol{\theta}}}_{n}$ of

(19)

\begin{equation} \hat{\boldsymbol{\boldsymbol{\theta}}}_{n} = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\arg \max }\,L_{n}(\boldsymbol{\theta}) = \underset{\boldsymbol{\theta} \in \boldsymbol{\Theta}}{\arg \min}\,\tilde{\textbf{l}}_{n}(\boldsymbol{\theta}),\end{equation}

where $\tilde{\textbf{l}}_{n}(\boldsymbol{\theta}) = n^{-1} \sum_{t=1}^{n} \tilde{\ell}_{t}$ and $\tilde{\ell}_{t} = \tilde{\ell}_{t}(\boldsymbol{\theta}) = ({\boldsymbol{\epsilon}_{t}^{2}}/{\tilde{\boldsymbol{\sigma}}_{t}^{2}})+\ln \tilde{\boldsymbol{\sigma}}_{t}^{2}$ .

Let $\mathcal{A}_{\boldsymbol{\theta}}(z) = \sum_{i=1}^{q} \alpha_{i} z^{i}$ and $\mathcal{B}_{\boldsymbol{\theta}}(z) = 1-\sum_{j=1}^{p} \beta_{j} z^{j}$ . It is not restrictive to assume that $q\geq1$ . By convention, $\mathcal{B}_{\boldsymbol{\theta}}(z)=1$ if $p=0$ . Let $\mathcal{F}_{t-1}$ be the $\sigma$ -field generated by $(\boldsymbol{\epsilon}_{t-1}, \boldsymbol{\epsilon}_{t-2}, \ldots)$ . To show the strong consistency, we make the following assumptions.

Assumption 1. $\boldsymbol{\theta}_{0} \in \boldsymbol{\Theta}$ and $\boldsymbol{\Theta}$ is compact.

Assumption 2. $\gamma(\textbf{A}_{0}) < 0$ and, for all $\boldsymbol{\theta} \in \boldsymbol{\Theta}$ , $\sum_{j=1}^{p} \beta_{j}<1$ .

Assumption 3. $(\boldsymbol{\eta}_t)$ is stationary and ergodic; $\boldsymbol{\eta}_{t}^{2}$ has a non-degenerate distribution with (i) $\mathbb{E}[\boldsymbol{\eta}_{t}^{2} \mid \mathcal{F}_{t-1}] = 1$ a.s. and (ii) $\mathbb{E}\ln\boldsymbol{\eta}_t^2>-\infty$ .

Assumption 4. If $p>0$ , $\mathcal{A}_{\boldsymbol{\theta}_0}(z)$ and $\mathcal{B}_{\boldsymbol{\theta}_{0}}(z)$ have no common root, $\mathcal{A}_{\boldsymbol{\theta}_{0}}(1) \neq 0$ , and $\alpha_{0 q} + \beta_{0 p} \neq 0$ .

Remark 2. Assumptions 1, 2, and 3 are standard (see [Reference Francq and Zakoian14] for comments on these assumptions). Assumption 3(i) is obviously less restrictive than the i.i.d. assumption with finite second-order moments. In Appendix D, we provide an explicit example of semi-strong GARCH based on a non-i.i.d. martingale difference innovation satisfying Assumption 3(i). This assumption was first used in [Reference Lee and Hansen21] for the inference of GARCH models, and [Reference Escanciano10] established the consistency of the QMLE under this assumption, with a small-order moment condition on the observed process instead of our Assumption 3(ii). Note that the latter assumption precludes densities with too much mass around zero, but is satisfied by most commonly used distributions. It is also weaker than the regularity condition on the $\boldsymbol{\eta}_t$ law ( $\lim _{t \rightarrow 0} t^{-\mu} \mathbb{P}\{\boldsymbol{\eta}_{0}^{2} \leqslant t\} = 0$ for some $\mu > 0$ ) used in [Reference Berkes, Horváth and Kokoszka1] (see Appendix E).

Assumption 2 implies that the roots of $\mathcal{B}_{\boldsymbol{\theta}}(z)$ are outside the unit disc. Therefore, by the second inequality of (17), we can define $(\boldsymbol{\sigma}_{t}^{2})=\{\boldsymbol{\sigma}_{t}^{2}(\boldsymbol{\theta})\}$ as the (unique) strictly stationary, ergodic, and non-anticipative solution of

(20)

\begin{equation} \boldsymbol{\sigma}_{t}^{2} = \omega + \sum_{i=1}^{q}\alpha_{i}\boldsymbol{\epsilon}_{t-i}^{2} + \sum_{j=1}^{p} \beta_{j}\boldsymbol{\sigma}_{t-j}^{2} \quad \text{for all }t ; \end{equation}

see Appendix F.

Note that $\boldsymbol{\sigma}_{t}^{2}(\boldsymbol{\theta}_{0}) = \boldsymbol{h}_{t}$ . Let

$$\textbf{l}_{n}(\boldsymbol{\theta}) =\textbf{l}_{n}(\boldsymbol{\theta};\,\boldsymbol{\epsilon}_{n}, \boldsymbol{\epsilon}_{n-1} \ldots) =n^{-1} \sum_{t=1}^{n} \ell_{t}, \qquad\ell_{t} = \ell_{t}(\boldsymbol{\theta}) =\frac{\boldsymbol{\epsilon}_{t}^{2}}{\boldsymbol{\sigma}_{t}^{2}} + \ln\boldsymbol{\sigma}_{t}^{2}.$$

We are now able to establish the strong consistency of the QMLE.

Theorem 3. Let $(\hat{\boldsymbol{\boldsymbol{\theta}}}_{n})$ be a sequence of QMLE satisfying (19), with any initial condition (18). Then, under Assumptions 1–4, $\hat{\boldsymbol{\boldsymbol{\theta}}}_{n} \rightarrow \boldsymbol{\theta}_{0}$ a.s. as $n \rightarrow \infty$ .

Remark 3. [Reference Escanciano10] established the asymptotic normality of the QMLE under the assumption that a small-order moment exists. This moment condition is mainly used to justify the existence of the asymptotic covariance of the QMLE. To the best of our knowledge, the asymptotic normality has never been shown without a hypothesis that implies the existence of a small-order moment. In some cases, the asymptotic covariance matrix may not exist without a finite moment of sufficiently large order [15, Section 3.1]. Study of the asymptotic distribution of the semi-strong GARCH without any moment condition is left for future work.

Proof of Theorem 3. The proof relies on the following intermediate results.

(i) $\lim_{n\to \infty} \sup_{\boldsymbol{\theta}\in\boldsymbol{\Theta}} |{\textbf{l}}_n(\boldsymbol{\theta})-\tilde{{\textbf{l}}}_n(\boldsymbol{\theta})| = 0$ a.s.
(ii) If ${\sigma}_t^2(\boldsymbol{\theta})={\sigma}_t^2(\boldsymbol{\theta}_0)$ a.s., then $\boldsymbol{\theta}=\boldsymbol{\theta}_0$ .
(iii) If $\boldsymbol{\theta}\ne \boldsymbol{\theta}_0$ then $\mathbb{E}\{\ell_1(\boldsymbol{\theta}) - \ell_1(\boldsymbol{\theta}_0)\} > 0$ .
(iv) Any $\boldsymbol{\theta}\neq \boldsymbol{\theta}_0$ has a neighborhood $V(\boldsymbol{\theta})$ such that $\liminf_{n\to\infty}(\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) - \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)) > 0$ a.s.

To prove (i), note that [14, (4.7)] shows that, almost surely,

$$ \sup_{\boldsymbol{\theta} \in \boldsymbol{\Theta}} |\textbf{l}_n(\boldsymbol{\theta})-\tilde{\textbf{l}}_n(\boldsymbol{\theta})| \leqslant \bigg\{\sup_{\boldsymbol{\theta} \in \boldsymbol{\Theta}}\frac{1}{\omega^2}\bigg\} C n^{-1} \sum_{t=1}^n \rho^t \boldsymbol{\epsilon}_t^2 + \bigg\{\sup_{\boldsymbol{\theta} \in \boldsymbol{\Theta}}\frac{1}{\omega}\bigg\} C n^{-1} \sum_{t=1}^n \rho^t $$

for some constants $C>0$ and $0<\rho<1$ (independent of n); (i) thus follows by Cesàro’s lemma, since the first inequality of (17) implies that $\rho^t\boldsymbol{\epsilon}_t^2 \rightarrow 0$ a.s. as $t \rightarrow \infty$ :

$$ \limsup_{n\to\infty}\frac{1}{k}\ln\rho^k\boldsymbol{\epsilon}_{t+k}^2 \leq \ln \rho+ \limsup_{n\to\infty}\frac{1}{k}\ln \boldsymbol{\epsilon}_{t+k}^2 \leq \ln \rho < 0. $$

The proof of (ii) uses the same arguments as those of step (ii) in the proof of [Reference Francq and Zakoian14, Theorem 2.1].

Now let us turn to the proof of (iii). For strong GARCH models it is known that $\mathbb{E}\ell_1(\boldsymbol{\theta}_0)$ is finite. This may not be the case in our framework, so we give an alternative proof of (iii). We first establish the existence of $\mathbb{E}\{\ell_1(\boldsymbol{\theta}) - \ell_1(\boldsymbol{\theta}_0)\}$ . Let $W_t(\boldsymbol{\theta})={\sigma}_t^2(\boldsymbol{\theta}_0)/{\sigma}_t^2(\boldsymbol{\theta})$ and, for $K>0$ , $A_K=[K^{-1}, K]$ , write

\begin{equation*} \ell_t(\boldsymbol{\theta}) - \ell_t(\boldsymbol{\theta}_0) = g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2){\unicode{x1D7D9}}_{W_t(\boldsymbol{\theta})\in A_K} + g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2){\unicode{x1D7D9}}_{W_t(\boldsymbol{\theta})\in A_K^c} \end{equation*}

where, for $x>0$ and $y\ge 0$ , $g(x,y)=-\log x+y(x-1)$ . Introducing the negative part $x^-=\max(-x, 0)$ of any real number x, we thus have

(21)

\begin{equation} \ell_t(\boldsymbol{\theta}) - \ell_t(\boldsymbol{\theta}_0) \ge g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2)\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K} - \{g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2)\}^-\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K^c}. \end{equation}

Noting that $W_t(\boldsymbol{\theta})$ is $\mathcal{F}_{t-1}$ -measurable and, by Assumption 3(i), $\mathbb{E}[g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2) \mid \mathcal{F}_{t-1}] = g(W_t(\boldsymbol{\theta}), 1)$ , the expectation of the first term on the right-hand side of (21) is well-defined and satisfies

\begin{equation*} \mathbb{E}[g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2)\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K}] = \mathbb{E}[g(W_t(\boldsymbol{\theta}),1)\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K}] \ge 0 \end{equation*}

since $g(x,1)\ge 0$ for any $x\ge 0$ , with equality only if $x=1$ . By (ii) we have that $W_t(\boldsymbol{\theta})=1$ a.s. if and only if $\boldsymbol{\theta}=\boldsymbol{\theta}_0$ . We thus have, by Beppo Levi’s theorem,

\begin{align*} \lim_{K\to \infty} \mathbb{E}[g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2)\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K}] & = \mathbb{E}[g(W_t(\boldsymbol{\theta}),1)\lim_{K\to \infty}\unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K}] \\ & = \mathbb{E}[g(W_t(\boldsymbol{\theta}),1)] > 0 \qquad \mbox{for } \boldsymbol{\theta} \neq \boldsymbol{\theta}_0. \end{align*}

To deal with the expectation of the second term on the right-hand side of (21), we use the fact that, for $y > 0$ , $g(x,y) \ge g(1/y,y)$ . It follows that

\begin{equation*} -\mathbb{E}[\{g(W_t(\boldsymbol{\theta}),\boldsymbol{\eta}_t^2)\}^- \unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K^c}] \ge -\mathbb{E}[\{g(1/\boldsymbol{\eta}_t^2,\boldsymbol{\eta}_t^2)\}^- \unicode{x1D7D9}_{W_t(\boldsymbol{\theta})\in A_K^c}] \to 0 \quad \mbox{as } K\to \infty, \end{equation*}

because, by Assumption 3(ii), $\mathbb{E}[\{g(1/\boldsymbol{\eta}_t^2,\boldsymbol{\eta}_t^2)\}^-] < \infty$ and thus the convergence holds by Lebesgue’s dominated convergence theorem. This completes the proof of (iii).

Now we prove (iv). As for (iii), the possible non-existence of $\mathbb{E}\ell_1(\boldsymbol{\theta})$ requires a modification of the standard proof. For any $\boldsymbol{\theta}\in \boldsymbol{\Theta}$ we have

\begin{equation*} \tilde{\textbf{l}}_n(\boldsymbol{\theta}) - \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0) \ge {\textbf{l}}_n(\boldsymbol{\theta}) - {\textbf{l}}_n(\boldsymbol{\theta}_0) - |\tilde{\textbf{l}}_n(\boldsymbol{\theta}) - {\textbf{l}}_n(\boldsymbol{\theta})| - |\tilde{\textbf{l}}_n(\boldsymbol{\theta}_0) - {\textbf{l}}_n(\boldsymbol{\theta}_0)|. \end{equation*}

Hence, using (i),

(22)

\begin{align} \liminf_{n\to\infty} & \Big(\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*)-\tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)\Big) \nonumber \\ & \ge \liminf_{n\to\infty} \Big(\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} {\textbf{l}}_n(\boldsymbol{\theta}^*) - {\textbf{l}}_n(\boldsymbol{\theta}_0)\Big) - 2\limsup_{n\to\infty}\sup_{\boldsymbol{\theta}\in\boldsymbol{\Theta}} |\tilde{\textbf{l}}_n(\boldsymbol{\theta}) - {\textbf{l}}_n(\boldsymbol{\theta})| \nonumber \\ & = \liminf_{n\to\infty} \Big(\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} {\textbf{l}}_n(\boldsymbol{\theta}^*) - {\textbf{l}}_n(\boldsymbol{\theta}_0)\Big). \end{align}

For any $\boldsymbol{\theta}\in \boldsymbol{\Theta}$ and any positive integer k, let $V_k(\boldsymbol{\theta})$ be the open ball of center $\boldsymbol{\theta}$ and radius $1/k$ . Then

(23)

\begin{equation} \liminf_{n\to\infty}\Big(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} {\textbf{l}}_n(\boldsymbol{\theta}^*) - {\textbf{l}}_n(\boldsymbol{\theta}_0)\Big) \ge \liminf_{n\to\infty} \frac1n \sum_{t=1}^n \inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0). \end{equation}

By arguments already given, under Assumption 3(ii),

\begin{equation*} \mathbb{E}\Big(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0)\Big)^- \le \mathbb{E}(g(1/\boldsymbol{\eta}_t^2, \boldsymbol{\eta}_t^2))^- < \infty. \end{equation*}

Therefore, $\mathbb{E}(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0))$ exists in $\mathbb{R}\cup \{+\infty\}$ , and the ergodic theorem applies [16, Exercises 7.3, 7.4]). From (23) we obtain

\begin{equation*} \liminf_{n\to\infty}\Big(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} {\textbf{l}}_n(\boldsymbol{\theta}^*) - {\textbf{l}}_n(\boldsymbol{\theta}_0)\Big) \ge \mathbb{E}\Big(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0)\Big). \end{equation*}

The latter term in parentheses converges to $\ell_t(\boldsymbol{\theta}) - \ell_t(\boldsymbol{\theta}_0)$ as $k\to \infty$ , and, by standard arguments using the positive and negative parts of $\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0)$ , we have

\begin{equation*} \lim_{k\to \infty}\mathbb{E} \Big(\inf_{\boldsymbol{\theta}^*\in V_k(\boldsymbol{\theta})\cap \boldsymbol{\Theta}} \ell_t(\boldsymbol{\theta}^*) - \ell_t(\boldsymbol{\theta}_0)\Big) = \mathbb{E}\{\ell_t(\boldsymbol{\theta}) - \ell_t(\boldsymbol{\theta}_0)\}, \end{equation*}

which, by (i), is strictly positive. In view of (22), the proof of (iv) is complete.

Now we complete the proof of the theorem. The set $\boldsymbol{\Theta}$ is covered by the union of an arbitrary neighborhood $V(\boldsymbol{\theta}_0)$ of $\boldsymbol{\theta}_0$ and, for any $\boldsymbol{\theta}\ne \boldsymbol{\theta}_0$ , by neighborhoods $V(\boldsymbol{\theta})$ satisfying (iv). Obviously, $\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta}_0)\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) \le \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)$ a.s. Moreover, by the compactness of $\boldsymbol{\Theta}$ , there exists a finite subcover of the form $V(\boldsymbol{\theta}_0), V(\boldsymbol{\theta}_1), \ldots, V(\boldsymbol{\theta}_M)$ . By (iv), for $i=1, \ldots, M$ , there exists $n_i$ such that, for $n\ge n_i$ , $\inf_{\boldsymbol{\theta}^*\in V(\boldsymbol{\theta}_i)\cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) > \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0)$ a.s. Thus, for $n\ge \max_{i=1, \ldots, M} (n_i)$ ,

\begin{equation*} \inf_{\boldsymbol{\theta}^* \in \bigcup_{i=1, \ldots, M} V(\boldsymbol{\theta}_i) \cap \boldsymbol{\Theta}} \tilde{\textbf{l}}_n(\boldsymbol{\theta}^*) > \tilde{\textbf{l}}_n(\boldsymbol{\theta}_0) \quad \textrm{a.s.}, \end{equation*}

from which we deduce that $\widehat{\boldsymbol{\theta}}_n$ belongs to $V(\boldsymbol{\theta}_0)$ for sufficiently large n.

Appendix A. Proof of Theorem 1

Proof For all $t \in \mathbb{Z}$ and $n \in \mathbb{N}$ , let

(24)

\begin{equation} \boldsymbol{X}_{t, n}=\Psi\left(\boldsymbol{\theta}_{t}, \boldsymbol{X}_{t-1, n-1}\right) \end{equation}

with $\boldsymbol{X}_{t, 0}=c$ . Note that $\boldsymbol{X}_{t, n} = \psi_{n}(\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots, \boldsymbol{\theta}_{t-n+1})$ for some measurable function $\psi_{n}\colon(E^{n},\mathcal{B}_{E^{n}})\rightarrow(F,\mathcal{B}_{F})$ , with the usual notation. For all n, the sequence $(\boldsymbol{X}_{t, n})_{t \in \mathbb{Z}}$ is thus stationary and ergodic. If, for all t, the limit $\boldsymbol{X}_{t} = \lim _{n \rightarrow \infty} \boldsymbol{X}_{t, n}$ exists a.s., then by taking the limit of both sides of (24), it can be seen that the process $(\boldsymbol{X}_{t})$ is a solution of (1). When it exists, the limit is a measurable function of the form $\boldsymbol{X}_{t} = \psi(\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots)$ and is therefore stationary, ergodic, and causal. For the measurability of $\boldsymbol{X}_{t}$ , we can consider the $\boldsymbol{X}_{t,n}$ as functions of ( $\boldsymbol{\theta}_{t}, \boldsymbol{\theta}_{t-1}, \ldots$ ) and argue that, in a metric space, a limit of measurable functions is measurable. The existence of $\lim_{n \rightarrow \infty} \boldsymbol{X}_{t, n}$ was proved in [Reference Elton8], which showed that, a.s., the sequence $(\boldsymbol{X}_{t, n})_{n \in \mathbb{N}}$ is a Cauchy sequence in the complete space F.

By iterating (24) we have $\boldsymbol{X}_{t, n} = \boldsymbol{\Psi}_{t}\circ\cdots\circ\boldsymbol{\Psi}_{t-n+1}(c)$ . It follows that

$$ d\left(\boldsymbol{X}_{t, n},\boldsymbol{X}_{t, n-1}\right) \leq \boldsymbol{\Lambda}_{t}^{(n-1)}d(\boldsymbol{\Psi}_{t-n+1}(c),c). $$

For $n<m$ , we thus have

(25)

\begin{align} d(\boldsymbol{X}_{t, m},\boldsymbol{X}_{t, n}) & \leq \sum_{k=0}^{m-n-1}d(\boldsymbol{X}_{t, m-k},\boldsymbol{X}_{t, m-k-1}) \nonumber \\ & \leq \sum_{k=0}^{m-n-1} \boldsymbol{\Lambda}_{t}^{(m-k-1)}d(\boldsymbol{\Psi}_{t-m+k+1}(c),c) \leq \sum_{j=n}^{\infty}\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c). \end{align}

Note that

\begin{equation*} \lim\sup_{j \rightarrow \infty}\ln(\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c))^{1 / j} = \lim\sup_{j \rightarrow \infty}\frac{1}{j}(\ln\boldsymbol{\Lambda}_{t}^{(j)} + \ln d(\boldsymbol{\Psi}_{t-j}(c),c)) < 0 \end{equation*}

under (i) and (ii), by using Kingman’s sub-additive ergodic theorem [Reference Kingman20] and [Reference Francq and Zakoian16, Exercise 4.12]. We conclude, from the Cauchy criterion for the convergence of series with positive terms, that $\sum_{j=1}^{\infty}\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c)$ is a.s. finite under (i) and (ii). It follows that $(\boldsymbol{X}_{t, n})_{n \in \mathbb{N}}$ is a.s. a Cauchy sequence in F. The existence of a stationary and ergodic solution to (1) follows.

Assume that there exists another stationary process $(\boldsymbol{X}_{t}^{*})$ such that $\boldsymbol{X}_{t}^{*} = \boldsymbol{\Psi}_{t}(\boldsymbol{X}_{t-1}^{*})$ . For all $N \geq 0$ ,

(26)

\begin{equation} d(\boldsymbol{X}_{t},\boldsymbol{X}_{t}^{*}) \leq \boldsymbol{\Lambda}_{t}^{(N+1)}d(\boldsymbol{X}_{t-N},\boldsymbol{X}_{t-N}^{*}). \end{equation}

Since $\boldsymbol{\Lambda}_{t}^{(N+1)} \rightarrow 0$ a.s. as $N \rightarrow \infty$ , and $d(\boldsymbol{X}_{t-N},\boldsymbol{X}_{t-N}^{*}) = O_{P}(1)$ by stationarity, the right-hand side of (26) tends to zero in probability. Since the left-hand side does not depend on N, we have $\mathbb{P}(d(\boldsymbol{X}_{t},\boldsymbol{X}_{t}^{*}) > \epsilon) = 0$ for all $\epsilon>0$ , and thus $\mathbb{P}(\boldsymbol{X}_{t}=\boldsymbol{X}_{t}^{*}) = 1$ , which establishes the uniqueness. In view of (25), we have $d(\boldsymbol{X}_{t},c) \leq \sum_{j=0}^{\infty}\boldsymbol{\Lambda}_{t}^{(j)}d(\boldsymbol{\Psi}_{t-j}(c),c)$ and (2) follows.

Appendix B. Proof of the comment following Theorem 2

For all $\epsilon>0$ , since $\mathbb{P}(\ln d(\boldsymbol{X}_{1},c)>\epsilon)=\mathbb{P}(\ln^+ d(\boldsymbol{X}_{1},c)>\epsilon)$ ,

\begin{align*} \sum_{n=1}^{\infty} \mathbb{P}(n^{-1}\ln d(\boldsymbol{X}_{t+n},c)>\epsilon) & = \sum_{n=1}^{\infty}\mathbb{P}(n^{-1}\ln^+ d(\boldsymbol{X}_{1},c)>\epsilon) \\ & \leq \int_{0}^{\infty}\mathbb{P}(t^{-1}\ln^+ d(\boldsymbol{X}_{1},c)>\epsilon) \,{\textrm{d}} t \\ & = \int_{0}^{\infty}\mathbb{P}(\epsilon^{-1}\ln^+ d(\boldsymbol{X}_{1},c)>t) \,{\textrm{d}} t \\ & = \epsilon^{-1} \mathbb{E}\ln^+ d(\boldsymbol{X}_{1},c) < \infty.\end{align*}

It follows by the Borel–Cantelli lemma that $\limsup n^{-1}\ln d(\boldsymbol{X}_{t+n},c) \leq 0$ a.s. The second result is obtained by the same arguments.

Appendix C. Proof of Corollary 1

We have, for all $n\geq 1$ , $\sup_{k\geq n}\max(0,\ln d(\boldsymbol{X}_{t+k},c))= \max(0,\sup_{k\geq n}\ln d(\boldsymbol{X}_{t+k},c))$ . It follows that

$$\limsup_{n}\frac{1}{n}\ln^+ d(\boldsymbol{X}_{t+n},c) =\max(0,\limsup_{n} \frac{1}{n}\ln d(\boldsymbol{X}_{t+n},c)) = 0 \quad \textrm{a.s.}$$

Since, in addition, $\ln^+ d(\boldsymbol{X}_{t+n},c)$ is non-negative, $\lim_{n\rightarrow\infty}({1}/{n})\ln^+ d(\boldsymbol{X}_{t+n},c)$ exists and is equal to 0 a.s. We get $\lim_{n\rightarrow\infty}({1}/{n})\ln^+ d(\boldsymbol{X}_{t-n},c)$ by the same arguments, which gives the first part of the corollary.

For (3), we have $\ln d(\boldsymbol{X}_{t+n},c) = \ln^+ d(\boldsymbol{X}_{t+n},c) - \ln^- d(\boldsymbol{X}_{t+n},c)$ . Since $({1}/{|n|})\ln^+ d(\boldsymbol{X}_{t-n},c)$ converges a.s. to 0 and $({1}/{|n|})\ln^- d(\boldsymbol{X}_{t-n},c)$ also converges a.s. to 0 as $|n|\rightarrow\infty$ [Reference Francq and Zakoian16, Exercise 2.13], $({1}/{|n|})\ln d(\boldsymbol{X}_{t+n},c)$ converges a.s. to 0 as $|n|\rightarrow\infty$ .

Appendix D. Construction of a semi-strong GARCH

We first define a non-i.i.d. martingale difference process. Consider a sequence $(\boldsymbol{x}_{t})_{t\in\mathbb{Z}}$ of i.i.d. random variables with standard normal distribution. Since, for all $z\in\mathbb{R}_+$ , $\boldsymbol{x}_{t}\sqrt{2z}-z\sim\mathcal{N}(-z,2z)$ , using the moment-generating function of the Gaussian distribution, we have

(27)

\begin{equation} \mathbb{E}[\exp(\boldsymbol{x}_{t}\sqrt{2z}-z)] = 1.\end{equation}

If $(\boldsymbol{z}_t)$ is a positive process, independent of $(\boldsymbol{x}_{t})$ , we also have $\mathbb{E}\boldsymbol{\eta}_t^2=1$ , where $\boldsymbol{\eta}_t^2=\exp(\boldsymbol{x}_{t}\sqrt{2\boldsymbol{z}_t}-\boldsymbol{z}_t)$ . This is the case if, for instance, $\boldsymbol{z}_t$ follows a causal AR(1) model of the form $\boldsymbol{z}_t=\phi \boldsymbol{z}_{t-1}+\boldsymbol{u}_t$ with $\phi\in (0,1)$ and $\boldsymbol{u}_t$ i.i.d. with positive variance. It is easy to see that $\mbox{Cov}(\boldsymbol{z}_1,\boldsymbol{z}_0)\neq 0$ , and thus

\begin{align*} \mbox{Cov}\{\ln(\boldsymbol{\eta}_{1}^2),\ln(\boldsymbol{\eta}_{0}^2)\} & = 2\mathbb{E}\{\boldsymbol{x}_{1}\sqrt{\boldsymbol{z}_{1}}\boldsymbol{x}_{0}\sqrt{\boldsymbol{z}_{0}}\} - \mathbb{E}\{\boldsymbol{x}_{1}\sqrt{2\boldsymbol{z}_{1}}\boldsymbol{z}_{0}\} \\ & \quad - \mathbb{E}\{\boldsymbol{z}_{1}\boldsymbol{x}_{0}\sqrt{2\boldsymbol{z}_{0}}\} + \mathbb{E}\{\boldsymbol{z}_{1}\boldsymbol{z}_{0}\} - \mathbb{E}\boldsymbol{z}_{1}\mathbb{E}\boldsymbol{z}_{0} \\ & = \mbox{Cov}\{\boldsymbol{z}_{1},\boldsymbol{z}_{0}\} \neq 0.\end{align*}

It follows that $(\boldsymbol{\eta}_t^2)$ is not i.i.d. We now define $(\boldsymbol{\eta}_t)$ . Let $(\boldsymbol{r}_t)$ be an i.i.d. sequence of Rademacher variables (uniform distribution on $\{-1,1\}$ ), independent of the two sequences $(\boldsymbol{x}_{t})$ and $(\boldsymbol{u}_{t})$ . We define $(\boldsymbol{\eta}_t)$ by $\boldsymbol{\eta}_t = \boldsymbol{r}_t\sqrt{\boldsymbol{\eta}_t^2}$ .

Let $(\mathcal{F}_{t})$ be the canonical filtration of $(\boldsymbol{\eta}_t)$ , i.e. $\mathcal{F}_{t}=\sigma(\boldsymbol{\eta}_k, k\leq t)$ . Define a second filtration $\mathcal{H}_t=\sigma(\boldsymbol{r}_{k},\boldsymbol{x}_{k+1},\boldsymbol{u}_{k+1}, k\leq t)$ . Since $\mathcal{F}_t\subset\mathcal{H}_t$ and $\boldsymbol{r}_{t}$ is independent of $\mathcal{H}_{t-1}$ , we have

\begin{equation*} \mathbb{E}[\boldsymbol{\eta}_{t} \mid \mathcal{F}_{t-1}] = \mathbb{E}\{\mathbb{E}[\boldsymbol{\eta}_{t} \mid \mathcal{H}_{t-1}] \mid \mathcal{F}_{t-1}\} = \mathbb{E}\{\exp([\boldsymbol{x}_{t}\sqrt{2\boldsymbol{z}_{t}} - \boldsymbol{z}_{t}]/2) \mathbb{E}[\boldsymbol{r}_{t} \mid \mathcal{H}_{t-1}] \mid \mathcal{F}_{t-1}\} = 0.\end{equation*}

Define a new filtration $\mathcal{I}_t=\sigma(\boldsymbol{r}_{k},\boldsymbol{x}_{k},\boldsymbol{u}_{k+1}, k\leq t)$ . Since $\mathcal{F}_t\subset\mathcal{I}_t$ , $\boldsymbol{z}_{t}$ is $\mathcal{I}_{t-1}$ -measurable, and $\boldsymbol{x}_{t}$ is independent of $\mathcal{I}_{t-1}$ , so by (27) we have

\begin{equation*} \mathbb{E}\left[\boldsymbol{\eta}_{t}^2 \mid \mathcal{F}_{t-1}\right] = \mathbb{E}\{\mathbb{E}[\boldsymbol{\eta}_{t}^2 \mid \mathcal{I}_{t-1}] \mid \mathcal{F}_{t-1}\} = \mathbb{E}\{\mathbb{E}[\exp(\boldsymbol{x}_{t}\sqrt{2\boldsymbol{z}_{t}} - \boldsymbol{z}_{t}) \mid \mathcal{I}_{t-1}] \mid \mathcal{F}_{t-1}\} = 1.\end{equation*}

We have thus shown the existence of a non-degenerate unit martingale difference sequence, that is, a stationary and ergodic sequence $(\boldsymbol{\eta}_{t})$ satisfying the conditions

\begin{equation*} \mathbb{E}[\boldsymbol{\eta}_{t}^{2}] < \infty, \qquad \mathbb{E}[\boldsymbol{\eta}_{t} \mid \mathcal{F}_{t-1}] = 0, \qquad \mathbb{E}[\boldsymbol{\eta}_{t}^{2} \mid \mathcal{F}_{t-1}] = 1, \qquad (\boldsymbol{\eta}_{t}^{2})\text{ are not i.i.d.} \end{equation*}

It is then easy to define a semi-strong GARCH with innovations $(\boldsymbol{\eta}_{t})$ .

Appendix E. Complement to Remark 2

Knowing that $\mathbb{E}(\ln^+(\boldsymbol{\eta}_1^2)) < \infty$ by Assumption 3(i), to establish Assumption 3(ii) it is therefore sufficient to prove that $\mathbb{E}(\ln^-(\boldsymbol{\eta}_1^2)) < \infty$ . Using $\mathbb{E}(\ln^-(\boldsymbol{\eta}_1^2)) = \int_0^\infty\mathbb{P}(\ln^+({1}/{\boldsymbol{\eta}_1^2})\geq s)\,{\textrm{d}} s = \int_0^\infty\mathbb{P}(\ln({1}/{\boldsymbol{\eta}_1^2}) \geq s)\,{\textrm{d}} s = \int_0^\infty\mathbb{P}({1}/{\boldsymbol{\eta}_1^2}\geq\exp(s))\,{\textrm{d}} s = \int_0^\infty\mathbb{P}(\boldsymbol{\eta}_1^2\leq\exp(-s))\,{\textrm{d}} s$ , we have, under the condition of [Reference Berkes, Horváth and Kokoszka1], that $\mathbb{P}(\boldsymbol{\eta}^2_1\leq \exp(-s))=o(\exp(-\mu s))$ when $s\rightarrow\infty$ , which gives the result.

Appendix F. Proof of the existence of a unique strictly stationary solution to (20)

Rewriting (20) in vector form as $\boldsymbol{\underline{\sigma}}_{t}^{2}=\underline{\boldsymbol{c}}_{t}+B \boldsymbol{\underline{\sigma}}_{t-1}^{2}$ , where

\begin{equation*}\boldsymbol{\underline{\sigma}}_{t}^{2} =\left(\begin{array}{c} \displaystyle \boldsymbol{\sigma}_{t}^{2} \\[6pt] \boldsymbol{\sigma}_{t-1}^{2} \\[6pt] \vdots \\[6pt] \boldsymbol{\sigma}_{t-p+1}^{2} \end{array}\right), \qquad\underline{\boldsymbol{c}}_{t} =\left(\begin{array}{c} \displaystyle \omega+\sum_{i=1}^{q} \alpha_{i} \boldsymbol{\epsilon}_{t-i}^{2} \\[6pt] 0 \\[6pt] \vdots \\[6pt] 0 \end{array}\right), \qquad B =\left(\begin{array}{c@{\quad}c@{\,\,}c@{\quad}c} \displaystyle \beta_{1} & \beta_{2} & \cdots & \beta_{p} \\[6pt] 1 & 0 & \cdots & 0 \\[6pt] \vdots & & & \vdots \\[6pt] 0 & \cdots & 1 & 0 \end{array}\right),\end{equation*}

we have, by the second inequality of (17) that $\limsup_{n\to\infty}({1}/{n})\ln \|\underline{\boldsymbol{c}}_{n}\|\leq 0$ . By Assumption 2, we deduce that

\begin{equation*}\limsup_{n\to\infty}\frac{1}{n}\ln\|B^n \underline{\boldsymbol{c}}_{n-1}^{2}\| \leq\limsup_{n\to\infty}\frac{1}{n}\ln\|B^n\| +\limsup_{n\to\infty}\frac{1}{n}\ln \|\underline{\boldsymbol{c}}_{n}\| < 0.\end{equation*}

From this, we deduce by the Cauchy rule that the series $\boldsymbol{\hat{\sigma}}_{t}^{2}:=\sum_{n=0}^{\infty}B^n \underline{\boldsymbol{c}}_{t-n}^{2}$ converges almost surely. We note that $(\boldsymbol{\hat{\sigma}}_{t}^{2})$ is a strictly stationary, ergodic, and non-anticipative solution of (20).

To show the uniqueness, assume that there exists another stationary process $(\boldsymbol{\underline{\sigma}}_{t}^{2}{*})$ of (20). For all $n \geq 0$ , we have $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|=\|B^n\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}-B^n\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|\leq\|B^n\|\|\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}\|+\|B^n\|\|\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|$ . Since $\|B^n\|\rightarrow 0$ a.s. as $n \rightarrow \infty$ and $\|\boldsymbol{\underline{\sigma}}_{t-n}^{2}{*}\|$ and $\|\boldsymbol{\hat{\sigma}}_{t-n}^{2}\|$ converge in law by stationarity, Slutsky’s theorem entails that $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|$ converges in law to 0 as $n \rightarrow \infty$ . Since $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\|$ does not depend on n, we conclude that $\|\boldsymbol{\underline{\sigma}}_{t}^{2}{*}-\boldsymbol{\hat{\sigma}}_{t}^{2}\| = 0$ a.s.

Acknowledgements

I am most thankful to the Editor and to two referees for their constructive comments and suggestions. I also want to acknowledge Christian Francq and Jean-Michel Zakoan for their guidance and feedback.

Funding information

There are no funding bodies to thank relating to the creation of this article.

Competing interests

There were no competing interests to declare which arose during the preparation or publication process of this article.

References

Berkes, I., Horváth, L. and Kokoszka, P. (2003). GARCH processes: Structure and estimation. Bernoulli 9, 201–227.CrossRef Google Scholar

Bougerol, P. (1993). Kalman filtering with random coefficients and contractions. SIAM J. Control Optimization 31, 942–959.Google Scholar

Bougerol, P. and Picard, N. (1992). Stationarity of GARCH processes and of some nonnegative time series. J. Econometrics 52, 115–127.Google Scholar

Bougerol, P. and Picard, N. (1992). Strict stationarity of generalized autoregressive processes. Ann. Prob. 20, 1714–1730.CrossRef Google Scholar

Brandt, A. (1986). The stochastic equation

$Y_{n+1} =A_nY_n+B_n$ with stationary coefficients. Adv. Appl. Prob. 18, 211–220.Google Scholar

Buraczewski, D., Damek, E. and Mikosch, T. (2016). Stochastic Models with Power-Law Tails. Springer, New York.Google Scholar

Diaconis, P. and Freedman, D. (1999). Iterated random functions. SIAM Rev. 41, 45–76.Google Scholar

Elton, J. H. (1990). A multiplicative ergodic theorem for Lipschitz maps. Stoch. Process. Appl. 34, 39–47.Google Scholar

Engle, R. F., Ghysels, E. and Sohn, B. (2013). Stock market volatility and macroeconomic fundamentals. Rev. Econom. Statist. 95, 776–797.Google Scholar

Escanciano, J. C. (2009). Quasi-maximum likelihood estimation of semi-strong GARCH models. Econometric Theory 25, 561–570.Google Scholar

Francq, C., Kandji, B. M. and Zakoian, J.-M. (2023). Inference on multiplicative component GARCH without any small-order moment. Submitted.Google Scholar

Francq, C. and Thieu, L. Q. (2019). QML Inference for volatility models with covariates. Econometric Theory 35, 37–72.Google Scholar

Francq, C., Wintenberger, O. and Zakoian, J.-M. (2018). Goodness-of-fit tests for Log-GARCH and EGARCH models. Test 27, 27–51.CrossRef Google Scholar

Francq, C. and Zakoian, J.-M. (2004). Maximum likelihood estimation of pure GARCH and ARMA-GARCH processes. Bernoulli 10, 605–637.Google Scholar

Francq, C. and Zakoian, J.-M. (2007). Quasi-maximum likelihood estimation in GARCH processes when some coefficients are equal to zero. Stoch. Process. Appl. 117, 1265–1284.CrossRef Google Scholar

Francq, C. and Zakoian, J.-M. (2019). GARCH Models: Structure, Statistical Inference and Financial Applications. John Wiley, Chichester.CrossRef Google Scholar

Han, H. and Kristensen, D. (2014). Asymptotic theory for the QMLE in GARCH-X models with stationary and nonstationary covariates. J. Business Econom. Statist. 32, 416–429.Google Scholar

Jensen, S. T. and Rahbek, A. (2004). Asymptotic inference for nonstationary GARCH. Econometric Theory 20, 1203–1226.Google Scholar

Kesten, H. (1973). Random difference equations and renewal theory for products of random matrices. Acta Math. 131, 207–248.Google Scholar

Kingman, J. F. C. (1973). Subadditive ergodic theory. Ann. Prob. 1, 883–899.CrossRef Google Scholar

Lee, S.-W. and Hansen, B. E. (1994). Asymptotic theory for the GARCH(1, 1) quasi-maximum likelihood estimator. Econometric Theory 10, 29–52.CrossRef Google Scholar

Straumann, D. and Mikosch, T. (2006). Quasi-maximum-likelihood estimation in conditionally heteroscedastic time series: A stochastic recurrence equations approach. Ann. Statist. 34, 2449–2495.Google Scholar

Tanny, D. (1974). A zero–one law for stationary sequences. Z. Wahrscheinlichkeitsth. 30, 139–148.CrossRef Google Scholar