Skip to main content Accessibility help
×
Home
Hostname: page-component-59b7f5684b-npccv Total loading time: 3.659 Render date: 2022-09-30T22:55:29.143Z Has data issue: true Feature Flags: { "shouldUseShareProductTool": true, "shouldUseHypothesis": true, "isUnsiloEnabled": true, "useRatesEcommerce": false, "displayNetworkTab": true, "displayNetworkMapGraph": false, "useSa": true } hasContentIssue true

On reconsidering entropies and divergences and their cumulative counterparts: Csiszár's, DPD's and Fisher's type cumulative and survival measures

Published online by Cambridge University Press:  21 February 2022

Konstantinos Zografos*
Affiliation:
Department of Mathematics, University of Ioannina, 451 10Ioannina, Greece. E-mail: kzograf@uoi.gr
Rights & Permissions[Opens in a new window]

Abstract

This paper concentrates on the fundamental concepts of entropy, information and divergence to the case where the distribution function and the respective survival function play the central role in their definition. The main aim is to provide an overview of these three categories of measures of information and their cumulative and survival counterparts. It also aims to introduce and discuss Csiszár's type cumulative and survival divergences and the analogous Fisher's type information on the basis of cumulative and survival functions.

Type
Research Article
Creative Commons
Creative Common License - CCCreative Common License - BYCreative Common License - NCCreative Common License - SA
This is an Open Access article, distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike licence (http://creativecommons.org/licenses/by-nc-sa/4.0), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the same Creative Commons licence is used to distribute the reused or adapted article and the original article is properly cited. The written permission of Cambridge University Press must be obtained prior to any commercial use.
Copyright
Copyright © The Author(s), 2022. Published by Cambridge University Press

1. Introduction

Measures of entropy, information and divergence have a long history and they hold a prominent position in the scientific life and literature. Some of them, like Shannon entropy, Shannon [Reference Shannon81], Fisher information measure, Fisher [Reference Fisher35], and Kullback–Leibler divergence, Kullback and Leibler [Reference Kullback and Leibler49] and Kullback [Reference Kullback48], have played a prominent role in the development of many scientific fields. Since the early 1960s and in light of the above-mentioned omnipresent and prominent universal quantities, there has been increased interest in the definition, the study, the axiomatic characterization and the applications of measures which formulate and express: (i) the amount of information or uncertainty about the outcome of a random experiment, (ii) the amount of information about the unknown characteristics of a population or about the unknown parameters of the respective distribution that drives the population or, (iii) the amount of information for discrimination between two distributions or between the respective populations which are driven by them. A classification of the measures of information into these three broad categories and a tabulation and discussion of their main properties and applications is provided in the publications by Ferentinos and Papaioannou [Reference Ferentinos and Papaioannou34], Papaioannou [Reference Papaioannou62], Zografos et al. [Reference Zografos, Ferentinos and Papaioannou99], Vajda [Reference Vajda89], Soofi [Reference Soofi83,Reference Soofi84], Papaioannou [Reference Papaioannou63], Cover and Thomas [Reference Cover and Thomas26], Pardo [Reference Pardo65], among many other citations in the field. The increasing interest of the scientific community in measures of information and the numerous applications of these quantities in several disciplines and contexts, in almost all the fields of science and engineering and first of all in probability and statistics, have contributed to the development of the field of statistical information theory, a field and a terminology which was initiated, to the best of our knowledge, in the title of the monograph by Kullback et al. [Reference Kullback, Keegel and Kullback50]. The numerous applications of the measures of information in the area of probability and statistics have led to an enormous number of papers, monographs and books, like that by Csiszár and Körner [Reference Csiszár and Körner30], Kullback et al. [Reference Kullback, Keegel and Kullback50], Liese and Vajda [Reference Liese and Vajda51], Read and Cressie [Reference Read and Cressie74], Vajda [Reference Vajda89], Arndt [Reference Arndt5], Pardo [Reference Pardo65], Basu et al. [Reference Basu, Shioya and Park14], among others, which have been published after the seminal monograph by Kullback [Reference Kullback48]. A huge number of statistical techniques and methods have been introduced and developed on the basis of entropies and divergences. These techniques are presented in the above monographs and the bibliography cited in them.

A resurgence of the interest in the definition and the development of new ideas and information theoretic measures is signaled by the paper of Rao et al. [Reference Rao, Chen, Vemuri and Wang73] and a subsequent paper by Zografos and Nadarajah [Reference Zografos and Nadarajah98] where entropy type measures are defined on the basis of the cumulative distribution function or on the basis of the respective survival function. These papers were the basis for an increasing interest for the definition of informational measures in the context of the paper by Rao et al. [Reference Rao, Chen, Vemuri and Wang73]. In this direction, entropy and divergence type measures are introduced and studied, in the framework of the cumulative distribution function or in terms of the respective survival function, in the papers by Rao [Reference Rao72], Zografos and Nadarajah [Reference Zografos and Nadarajah98], Di Crescenzo and Longobardi [Reference Di Crescenzo and Longobardi31], Baratpour and Rad [Reference Baratpour and Rad12], Park et al. [Reference Park, Rao and Shin66] and the subsequent papers by Di Crescenzo and Longobardi [Reference Di Crescenzo and Longobardi32], Klein et al. [Reference Klein, Mangold and Doll46], Asadi et al. [Reference Asadi, Ebrahimi and Soofi6,Reference Asadi, Ebrahimi and Soofi7], Park et al. [Reference Park, Alizadeh Noughabi and Kim67], Klein and Doll [Reference Klein and Doll45], among many others. In these and other treatments, entropy type measures and Kullback-Leibler type divergences have been mainly received the attention of the authors. Entropy and divergence type measures are also considered in the papers by Klein et al. [Reference Klein, Mangold and Doll46], Asadi et al. [Reference Asadi, Ebrahimi and Soofi6], Klein and Doll [Reference Klein and Doll45] by combining the cumulative distribution function and the survival function. However, to the best of our knowledge, it seems that there has not yet appeared in the existing literature a definition of the broad class of Csiszár's type $\phi$-divergences or a definition of the density power divergence in the framework which was initiated in the paper by Rao et al. [Reference Rao, Chen, Vemuri and Wang73]. In addition, to the best of our knowledge, there is no an analogous formulation of Fisher's measure of information, as by product of this type of divergences. This paper aims to bridge this gap.

In the context described above, this paper is structured as follows. The next section provides a short review of measures of entropy, divergence and Fisher's type in the classic setup. A similar review is provided in Section 3 for the concepts of cumulative and survival entropies and divergences and the proposed in the existing literature respective measures. Section 4 is devoted to the definition of Csiszár's type $\phi$-divergences in terms of the cumulative and the survival function. The density power divergence type is also defined in the same setup. Section 5 is concentrated to the definition of Fisher measure of information in terms of the cumulative distribution function and the survival function. The content of the paper is summarized in the last section where some conclusions and directions of a future work are also presented.

2. A short review on entropies and divergences

To present some of the measures of entropy and divergence, which will be mentioned later, consider the probability space $(\mathcal {X},\mathcal {A},P)$, and let a $\sigma$-finite measure $\mu$ on the same space with $P\ll \mu$. Denote by $f$ the respective Radon–Nikodym derivative $f=dP/d\mu$. The Shannon entropy, Shannon [Reference Shannon81], is defined by

(1)\begin{equation} \mathcal{E}_{Sh}(f)={-}\int_{\mathcal{X}}f(x)\ln f(x)d\mu, \end{equation}

and it is a well known and broadly applied quantity, as its range of applications is extending from thermodynamics to algorithmic complexity, including a fundamental usage in probability and statistics (cf. [Reference Cover and Thomas26]). Two well-known extensions of Shannon entropy have been introduced by Rényi [Reference Rényi75] and Tsallis [Reference Tsallis88], as follows:

(2)\begin{equation} \mathcal{E}_{R,\alpha }(f)=\frac{1}{1-\alpha }\ln \int_{\mathcal{X} }f^{\alpha }(x)d\mu,\quad \alpha >0,\ \alpha \neq 1, \end{equation}

and

(3)\begin{equation} \mathcal{E}_{Ts,\alpha }(f)=\frac{1}{\alpha -1}\left(1-\int_{ \mathcal{X}}f^{\alpha }(x)d\mu \right),\quad \alpha >0,\ \alpha \neq 1, \end{equation}

respectively. It is easily seen that $\lim _{\alpha \rightarrow 1}\mathcal {E}_{R,\alpha }(f)=\mathcal {E}_{Sh}(f)$ and $\lim _{\alpha \rightarrow 1} \mathcal {E}_{Ts,\alpha }(f)=\mathcal {E}_{Sh}(f)$. All the above presented measures have been initially defined in the discrete case. Although the concept of entropy has been introduced by the second law of thermodynamics, Shannon has defined $\mathcal {E}_{Sh}$, in the discrete case, as a measure of the information transmitted in a communication channel. Shannon entropy is analogous of the thermodynamic entropy and from a probabilistic or statistical point of view it is a measure of uncertainty before the implementation of a random experiment regarding its final result. This interpretation is based on the fact that Shannon entropy is maximized subject to the most uncertain distribution, the univariate discrete uniform distribution. Hence, Shannon entropy quantifies uncertainty relative to the uniform distribution and this is the fundamental characteristic of all proper entropy measures. Rényi's measure $\mathcal {E}_{R,\alpha }$ extends Shannon's entropy while Tsallis’ entropy $\mathcal {E}_{Ts,\alpha }$ has motivated by problems in statistical mechanics and it is related to the $\alpha$-order entropy of Havrda and Charvát [Reference Havrda and Charvát41], cf. also Pardo [Reference Pardo65] p. 20 Table 1.1. It should be emphasized, at this point, that $\mathcal {E}_{R,\alpha }$ and $\mathcal {E}_{Ts,\alpha }$ are functionally related, for $\alpha =2$, with the extropy of a random variable $X$ which receives a great attention the last decade (cf. [Reference Frank, Sanfilippo and Agró36,Reference Qiu69,Reference Qiu and Jia70]). The extropy of a random variable $X$ with an absolutely continuous distribution function $F$ and respective probability density function $f$ is defined by $J(f)=-\frac {1}{2}\int _{\mathcal { X}}f^{2}(x)d\mu$ and it has been introduced in the statistical literature as the complement of Shannon entropy. This measure is directly connected with Onicescu [Reference Onicescu61] information energy defined by $E(f)=-2J(f)$. It is also noted that $\int _{\mathcal {X}}f^{\alpha }(x)d\mu$, in (2) and (3), defines the Golomb [Reference Golomb37] information function (cf. also [Reference Guiasu and Reischer39]) which is still used nowadays (cf. [Reference Kharazmi and Balakrishnan44]). Last, we have to mention that Rényi's measure $\mathcal {E}_{R,\alpha }$ is the basis for the definition by Song [Reference Song82] of a general measure of the shape of a distribution. Song's measure is defined by $\mathcal {S}(f)=-2(d/d\alpha )\mathcal {E}_{R,\alpha }(f)|_{\alpha =1}={\rm Var}[\ln f(X)]$ and it has been applied and studied inside the family of elliptically contoured distributions in Zografos [Reference Zografos97] and Batsidis and Zografos [Reference Batsidis and Zografos15]. The measure $\mathcal {S}(f)$ above is the varentropy used in Kontoyiannis and Verdú [Reference Kontoyiannis and Verdú47] and Arikan [Reference Arikan4], where the measure is defined in terms of conditional distributions.

Burbea and Rao [Reference Burbea and Rao20] have extended (1) and (3) by introducing the $\phi$-entropy functional

(4)\begin{equation} \mathcal{E}_{\phi }(f)={-}\int_{\mathcal{X}}\phi (f(x))d\mu, \end{equation}

where $\phi$ is a convex real function which satisfies suitable conditions. Shannon's and Tsallis’ entropies are obtained as particular cases of $\mathcal {E}_{\phi }$ for specific choices of the convex function $\phi$ and more precisely for $\phi (u)=u\ln u$ and $\phi (u)=(\alpha -1)^{-1}(u^{\alpha }-u)$, $\alpha >0,\alpha \neq 1,u>0$, respectively. Observe that Rényi's measure $\mathcal {E}_{R}$ does not directly follow from $\phi$-entropy functional. This point led Salicru et al. [Reference Salicrú, Menéndez, Morales and Pardo79] to define the $(h,\phi )$-entropy which unified all the existing, at that time, entropy measures. Based on Pardo [Reference Pardo65] p. 21, the $(h,\phi )$-entropy is defined as follows:

(5)\begin{equation} \mathcal{E}_{\phi }^{h}(f)=h\left(\int_{\mathcal{X}}\phi (f(x))d\mu \right), \end{equation}

where either $\phi :\ (0,\infty )\rightarrow \mathbb {R}$ is concave and $h:\ \mathbb {R} \rightarrow \mathbb {R}$ is differentiable and increasing, or $\phi :\ (0,\infty )\rightarrow \mathbb {R}$ is convex and $h:\ \mathbb {R} \rightarrow \mathbb {R}$ is differentiable and decreasing. Table 1.1 in p. 20 of Pardo [Reference Pardo65] lists important entropy measures obtained from (5) for particular choices of the functions $\phi$ and $h$.

Following the above short overview on the most historic measures of entropy, lets now proceed to a short review on measures of divergence between probability distributions. Consider the measurable space $(\mathcal {X}, \mathcal {A})$, and two probability measures $P$ and $Q$ on this space. Let a $\sigma$-finite measure $\mu$ on the same space with $P\ll \mu$ and $Q\ll \mu$. Denote by $f$ and $g$ the respective Radon–Nikodym derivatives, $f=dP/d\mu$ and $g=dQ/d\mu$. The most historic measure of divergence is the well-known Kullback–Leibler divergence (cf. [Reference Kullback48,Reference Kullback and Leibler49]) which is defined by

(6)\begin{equation} \mathcal{D}_{0}(f:g)=\int_{\mathcal{X}}f(x)\ln \left(\frac{f(x)}{ g(x)}\right) d\mu. \end{equation}

Intuitively speaking, $\mathcal {D}_{0}$ expresses the information, contained in the data, for discrimination between the underlined models $f$ and $g$. Several interpretations of $\mathcal {D}_{0}$ are discussed in the seminal paper by Soofi [Reference Soofi83]. In this context, $\mathcal {D}_{0}$ quantifies the expected information, contained in the data, for discrimination between the underlined models $f$ and $g$ in favor of $f$. This interpretation is based on the Bayes Theorem (cf. [Reference Kullback48] pp. 4–5). Moreover, $\mathcal {D}_{0}$ measures loss or gain of information. It has been interpreted as a measure of loss of information when one of the two probability density functions represents an ideal distribution and $\mathcal {D}_{0}$ measures departure from the ideal; (e.g., $f$ is the unknown “true” data-generating distribution and $g$ is a model utilized for the analysis). However, following [Reference Soofi83] p. 1246, $\mathcal {D}_{0}$ in (6) “is often used just as a measure of divergence between two probability distributions rather than as a meaningful information quantity in the context of the problem being discussed.” The above defined Kullback–Leibler divergence $\mathcal {D}_{0}$ satisfies the non-negativity property, that is, $\mathcal {D}_{0}(f:g)\geq 0$, with equality if and only if the underlined densities are coincide, $f=g$, a.e. (cf. [Reference Kullback48] p. 14).

Rényi [Reference Rényi75] has extended the above measure by introducing and studying the information of order $\alpha$ by the formula:

(7)\begin{equation} \mathcal{D}_{R,\alpha }(f:g)=(1/(\alpha -1))\ln \int_{\mathcal{X} }f^{\alpha }(x)g^{1-\alpha }(x)d\mu,\quad \alpha >0,\ \alpha \neq 1. \end{equation}

This measure is related with Kullback–Leibler divergence by the limiting behavior, $\lim _{\alpha \rightarrow 1}\mathcal {D}_{R,\alpha }(f:g)=\mathcal {D}_{0}(f:g)$. After Rényi's divergence, the broad class of $\phi$-divergence between two densities $f$ and $g$ has been introduced by Csiszár [Reference Csiszár28,Reference Csiszár29] and independently by Ali and Silvey [Reference Ali and Silvey1]. Some authors (see, e.g., [Reference Harremoës and Vajda40]) mention that $\phi$-divergence has been also independently introduced by Morimoto [Reference Morimoto58]. This omnipresent measure is defined by

(8)\begin{equation} \mathcal{D}_{\phi }(f:g)=\int_{\mathcal{X}}g(x)\phi \left(\frac{f(x) }{g(x)}\right) d\mu, \end{equation}

for two Radon–Nikodym derivatives $f$ and $g$ on the measurable space $\mathcal {X}$. $\phi :(0,\infty )\rightarrow \mathbb {R}$ is a real-valued convex function satisfying conditions which ensure the existence of the above integral. Based on Csiszár [Reference Csiszár28,Reference Csiszár29] and Pardo [Reference Pardo65] p. 5, it is assumed that the convex function $\phi$ belongs to the class of functions

(9)\begin{equation} \Phi =\left\{ \phi :\phi \text{ is strictly convex at }1\text{, with }\phi (1)=0,0\phi \left(\frac{0}{0}\right) =0,0\phi \left(\frac{u}{0}\right) = \underset{v\rightarrow \infty }{\lim }\frac{\phi (v)}{v}\right\}. \end{equation}

In order to be (8) useful in statistical applications, the class $\Phi$ is enriched with the additional assumption $\phi ^{\prime }(1)=0$ (cf. [Reference Pardo65] p. 5). Csiszár's $\phi$-divergence owes its wide range of applications to the fact that it can be considered as a measure of quasi-distance or a measure of statistical distance between two probability densities $f$ and $g$ since it obeys the non-negativity and identity of indiscernibles property, a terminology which is conveyed by Weller-Fahy et al. [Reference Weller-Fahy, Borghetti and Sodemann93] and it is formulated by

(10)\begin{equation} \mathcal{D}_{\phi }(f:g)\geq 0\text{ with equality if and only if } f(x)=g(x),\quad {\rm a.e.} \end{equation}

Csiszár's $\phi$-divergence is not symmetric for each convex function $\phi \in \Phi$ but it can become symmetric if we restrict to the convex functions $\phi _{\ast },$ defined by $\phi _{\ast }(u)=\phi (u)+u\phi ({1}/{u})$, for $\phi \in \Phi$ (cf. [Reference Liese and Vajda51,Reference Vajda90] p. 14 p. 23 Theorem 4). This measure does not obey the triangular inequality, in general, while a discussion about this property and its satisfaction by some measures of divergence is provided in Liese and Vajda [Reference Liese and Vajda52], Vajda [Reference Vajda91]. Several well known in the literature divergences can be obtained from $\mathcal {D}_{\phi }(f:g),$ given in (8) above, for specific choices of the convex function $\phi \in \Phi$. We mention only the Kullback–Leibler divergence (6), which is obtained from (8) for $\phi (u)=u\ln u$ or $\phi (u)=u\ln u+u-1,u>0$ (see [Reference Kullback and Leibler49] or [Reference Kullback48]) and the Cressie and Read $\lambda$-power divergence or the $I_{\alpha }$-divergence of Liese and Vajda [Reference Liese and Vajda51], obtained from (8) for $\phi (u)=\phi _{\lambda }(u)={(u^{\lambda +1}-u-\lambda (u-1))}/{\lambda (\lambda +1)},\lambda \neq 0,-1,u>0$ (see [Reference Cressie and Read27,Reference Liese and Vajda51,Reference Read and Cressie74]) and defined by

(11)\begin{equation} \mathcal{D}_{\lambda }(f:g)=\frac{1}{\lambda (\lambda +1)}\left(\int_{\mathcal{X}}g(x)\left(\frac{f(x)}{g(x)}\right)^{\lambda +1}d\mu -1\right),\quad -\infty <\lambda <{+}\infty,\ \lambda \neq 0,-1. \end{equation}

This measure is also related to $\mathcal {D}_{0}$ by means of the limiting behavior $\lim _{\lambda \rightarrow 0}\mathcal {D}_{\lambda }(f:g)=\mathcal {D}_{0}(f:g)$ and $\lim _{\lambda \rightarrow -1}\mathcal {D}_{\lambda }(f:g)= \mathcal {D}_{0}(g:f)$. Cressie and Read $\lambda$-power divergence $\mathcal {D}_{\lambda }(f:g)$ is closely related to Rényi's divergence $\mathcal {D}_{R,\alpha }(f:g)$, in the sense

$$\mathcal{D}_{R,\alpha }(f:g)=\frac{1}{\alpha -1}\ln \left[ \alpha (\alpha -1) \mathcal{D}_{\alpha -1}(f:g)+1\right],$$

in view of (7) and (11). It is easy to see that Rényi's divergence is not included in the family of $\phi$-divergence. This point led Menéndez et al. [Reference Menéndez, Morales, Pardo and Salicrú56] to define the $(h,\phi )$-divergence which unified all the existing divergence measures. Based on Pardo [Reference Pardo65] p. 8, the $(h,\phi )$-divergence is defined as follows:

$$\mathcal{D}_{\phi }^{h}(f:g)=h\left(\int_{\mathcal{X}}g(x)\phi \left(\frac{f(x)}{g(x)}\right) d\mu \right),$$

where $h$ is a differentiable increasing real function mapping from $[ 0,\phi (0)+\lim _{t\rightarrow \infty }(\phi (0)/t)]$ onto $[0,\infty )$. Special choices of the functions $h$ and $\phi$ lead to particular divergences, like Rényi's, Sharma-Mittal, Bhattacharyya, and they are tabulated in p. 8 of Pardo [Reference Pardo65].

We will close this short exposition on measures of divergence between probability distributions with a presentation of the density power divergence introduced by Basu et al. [Reference Basu, Harris, Hjort and Jones13], in order to develop and study robust estimation procedures on the basis of this new family of divergences. For two Radon–Nikodym derivatives $f$ and $g$, the density power divergence (DPD) between $f$ and $g$ was defined in Basu et al. [Reference Basu, Harris, Hjort and Jones13], cf. also Basu et al. [Reference Basu, Shioya and Park14], by

(12)\begin{equation} d_{a}(f:g)=\int_{\mathcal{X}}\left \{ g(x)^{1+a}-\left(1+\frac{1}{a }\right) g(x)^{a}f(x)+\frac{1}{a}f(x)^{1+a}\right \} d\mu, \end{equation}

for $a>0,$ while for $a=0$, it is defined by

$$\lim_{a\rightarrow 0}d_{a}(f:g)=\mathcal{D}_{0}(f:g).$$

For $a=1$, (12) reduces to the $L_{2}$ distance $L_{2}(f,g)=\int _{\mathcal {X}}(f(x)-g(x)) ^{2}d\mu$. It is also interesting to note that (12) is a special case of the so-called Bregman divergence

$$\int_{\mathcal{X}}\left[ T(f(x))-T(g(x))-\{f(x)-g(x)\}T^{\prime }(g(x))\right] d\mu.$$

If we consider $T(l)=l^{1+a}$, we get $a$ times $d_{a}(f:g)$. The density power divergence depends on the tuning parameter $a$ which controls the trade off between robustness and asymptotic efficiency of the parameter estimates which are the minimizers of this family of divergences (cf. [Reference Basu, Shioya and Park14] p. 297). Based on Theorem 9.1 of this book, $d_{a}(f:g)$ represents a genuine statistical distance for all $a\geq 0$, that is, $d_{a}(f:g)\geq 0$ with equality, if and only if, $f(x)=g(x),$ a.e. $x$. The proof of this result is provided in Basu et al. [Reference Basu, Shioya and Park14] p. 301 for the case $a>0$. The case $a=0$ follows from a similar property which is proved and obeyed by $\mathcal {D}_{0}(f:g)=\int _{\mathcal {X}}f(x)\ln (f(x)/g(x)) d\mu$ (cf. [Reference Kullback48] Thm. 3.1 p. 14). For more details about this family of divergence measures, we refer to Basu et al. [Reference Basu, Shioya and Park14].

Closing this review on divergences, interesting generalized and unified classes of divergences have been recently proposed in the literature by Stummer and Vajda [Reference Stummer and Vajda86] and Broniatowski and Stummer [Reference Broniatowski and Stummer19] while extensions in the case of discrete, non-probability vectors with applications in insurance can be found in Sachlas and Papaioannou [Reference Sachlas and Papaioannou76,Reference Sachlas and Papaioannou77]. Csiszár's $\phi$-divergence has been also recently extended to a local setup by Avlogiaris et al. [Reference Avlogiaris, Micheas and Zografos8] and the respective local divergences have been used to develop statistical inference and model selection techniques in a local setting (cf. [Reference Avlogiaris, Micheas and Zografos9,Reference Avlogiaris, Micheas and Zografos10]).

The third category of measures of information is that of parametric or Fisher's type measures of information (cf. [Reference Ferentinos and Papaioannou34,Reference Papaioannou62,Reference Papaioannou63]). Fisher information measure is the main representative of this category of measures and it is well known from the theory of estimation and the Cramér–Rao inequality. Fisher information measure is defined by

(13)\begin{equation} \mathcal{I}_{f}^{Fi}(\theta)=\int_{\mathcal{X}}f(x;\theta)\left(\frac{d}{d\theta }\ln f(x;\theta)\right)^{2}d\mu, \end{equation}

where $f(x;\theta )$ is the Radon–Nikodym derivative of a parametric family of probability measures $P_{\theta }\ll \mu$ on the measurable space $(\mathcal {X},\mathcal {A}),$ while the parameter $\theta \in \Theta \subseteq \mathbb {R}$. The measure defined above is a fundamental quantity in the theory of estimation, connected, in addition, with the asymptotic variance of the maximum likelihood estimator, subject to a set of suitable regularity assumptions (cf. [Reference Casella and Berger23] p. 311 326). Subject to the said conditions, the following representation of $\mathcal {I}_{f}^{Fi}(\theta )$,

(14)\begin{equation} \mathcal{I}_{f}^{Fi}(\theta)={-}\int_{\mathcal{X}}f(x;\theta)\frac{ d^{2}}{d\theta^{2}}\ln f(x;\theta)d\mu, \end{equation}

has a prominent position in the literature as it provides an easy way to get the expression of Fisher information measure, in some applications. From an information theoretic point of view, $\mathcal {I}_{f}^{Fi}$ formulates and it expresses the amount of information contained in the data about the unknown parameter $\theta$. Several extensions of (13) have been appeared in the bibliography of the subject while this measure obeys nice information theoretic and statistical properties (cf. [Reference Papaioannou62]).

Besides Fisher's information measure (13), another quantity is widely used in different areas, such as in statistics and in functional analysis (cf. [Reference Bobkov, Gozlan, Roberto and Samson18,Reference Carlen22,Reference Mayer-Wolf55] and references appeared therein). This measure is defined by

(15)\begin{equation} \mathcal{J}^{Fi}(f)=\int_{\mathbb{R} }h(x)\left(\frac{d}{dx}\ln h(x)\right)^{2}dx, \end{equation}

where, without any loss of generality, $h$ is a density with $x\in \mathbb {R}$ (cf. [Reference Stam85] p. 102) and $\mathcal {J}^{Fi}(f)$ coincides with (13) when $f(x;\theta )=h(x-\theta )$, that is, when the parameter $\theta$ is a location parameter, in the considered model $f(x;\theta )$, $x\in \mathbb {R}$, $\theta \in \Theta \subseteq \mathbb {R}$. Papaioannou and Ferentinos [Reference Papaioannou and Ferentinos64] have studied the above measure, calling it Fisher information number, and the authors provided with an alternative expression of it, $\mathcal {J}_{\ast }^{Fi}(f)=-\int _{ \mathbb {R} }h(x)({d^{2}}/{dx^{2}})\ln h(x)dx$, as well. The above measure is not so well known in the statistics literature, however it receives the attention of researchers and it is connected with several results in statistics, in statistical physics, in signal processing (cf., e.g., [Reference Choi, Lee and Song25,Reference Toranzo, Zozor and Brossier87,Reference Walker92], and references therein). The multivariate version of (15) is analogous and it also received the attention of researches nowadays. We refer to the recent work by Yao et al. [Reference Yao, Nandy, Lindsay and Chiaromonte94] and references therein, while the multivariate version has been exploited in Zografos [Reference Zografos95,Reference Zografos96] for the definition of measures of multivariate dependence.

Based on Soofi [Reference Soofi83] p. 1246, Fisher's measure of information within a second-order approximation is the discrimination information between two distributions that belong to the same parametric family and differ infinitesimally over a parameter space. More precisely, subject to the standard regularity conditions of estimation theory (cf. [Reference Casella and Berger23] p. 311 326), stated also on pp. 26–27 of the monograph by Kullback [Reference Kullback48], Fisher information measure $\mathcal {I}_{f}^{Fi}$ is connected with Kullback–Leibler divergence $\mathcal {D}_{0}$, defined in (6), by the next equality derived in the monograph of Kullback [Reference Kullback48] p. 28,

(16)\begin{equation} \lim_{\delta \rightarrow 0}\frac{1}{\delta^{2}}\mathcal{D}_{0}(f(x;\theta):f(x;\theta +\delta ))=\mathcal{I}_{f}^{Fi}(\theta),\quad \theta \in \Theta, \end{equation}

while similar connections of Fisher information measure with other divergences, obtained from Csiszár's divergence in (8), have been derived in Ferentinos and Papaioannou [Reference Ferentinos and Papaioannou34]. The limiting relationship between Kullback–Leibler divergence and Fisher information, formulated in (16), can be easily extended to the case of Csiszár's $\phi$-divergence (8). In this context, it can be easily proved (cf. [Reference Salicrú78]) that

(17)\begin{equation} \lim_{\delta \rightarrow 0}\frac{1}{\delta^{2}}\mathcal{D}_{\phi }(f(x;\theta +\delta):f(x;\theta ))=\frac{\phi^{\prime \prime }(1)}{2} \mathcal{I}_{f}^{Fi}(\theta),\quad \theta \in \Theta. \end{equation}

To summarize this section, it was presented above the most representative measures of statistical information theory which play an important role, the last seven decades, not only to the fields of probability and statistics but also to many other fields of science and engineering. Interesting analogs of the above measures on the basis of the cumulative function or on the basis of the survival functions occupies a significant part of the respective literature the last 17 years and this line of research work is outlined in the next section.

3. A short review on cumulative entropies and cumulative Kullback–Leibler information

To present some of the measures of cumulative entropy and cumulative divergence, suppose in this section that $X$ is a non-negative random variable with distribution function $F$ and respective survival function $\bar {F}(x)=1-F(x)$. Among the huge amount of extensions or analogs of Shannon entropy, defined in (1), the cumulative residual entropy is a notable and worthwhile recent analog. Rao et al. [Reference Rao, Chen, Vemuri and Wang73], in a pioneer paper, introduced the cumulative residual entropy with a functional similarity with Shannon's [Reference Shannon81] omnipresent entropy measure. Rao's et al. [Reference Rao, Chen, Vemuri and Wang73] measure is defined by

(18)\begin{equation} {\rm CRE}(F)={-}\int_{0}^{+\infty }\bar{F}(x)\ln \bar{F}(x)dx, \end{equation}

where $\bar {F}(x)=1-F(x)$ is the cumulative residual distribution or the survival function of a non-negative random variable $X$. A year later, Zografos and Nadarajah [Reference Zografos and Nadarajah98] provided a timely elaboration of Rao et al. [Reference Rao, Chen, Vemuri and Wang73] measure and they have defined the survival exponential entropies by

(19)\begin{equation} M_{\alpha }(F)=\left(\int_{0}^{+\infty }\bar{F}^{\alpha }(x)dx\right)^{\frac{1}{1-\alpha }},\quad \alpha >0,\ \alpha \neq 1, \end{equation}

where, again, $\bar {F}(x)=1-F(x)$ is the survival function of a non-negative random variable $X$. The quantity $M_{\alpha }(F)$, defined by (19), asymptotically coincides with the exponential function of the cumulative residual entropy ${\rm CRE}(F)$, suitably scaled in the following sense

$$\lim_{\alpha \rightarrow 1}M_{\alpha }(F)=\exp \left\{ -\frac{{\rm CRE}(F)}{ \int_{0}^{+\infty }\bar{F}(x)dx}\right\}.$$

Moreover, the logarithmic function of $M_{\alpha }(F)$ leads to an analogous quantity to that of Rényi entropy (2) (cf. [Reference Zografos and Nadarajah98]). The analogous of ${\rm CRE}(F)$ Tsallis’ [Reference Tsallis88] measure (3) has been recently considered in the papers by Sati and Gupta [Reference Sati and Gupta80], Calì et al. [Reference Calì, Longobardi and Ahmadi21], Rajesh and Sunoj [Reference Rajesh and Sunoj71] and the references appeared therein. It has a similar functional form as that of $\mathcal {E}_{Ts}(f)$ in (3), given by

$${\rm CRE}_{Ts,\alpha }(F)=\frac{1}{\alpha -1}\left(1-\int_{0}^{+\infty } \bar{F}^{\alpha }(x)dx\right),\quad \alpha >0,\ \alpha \neq 1,$$

while letting $\alpha \rightarrow 1$, $\lim _{\alpha \rightarrow 1}{\rm CRE}_{Ts,\alpha }(F)={\rm CRE}(F)$. Asadi et al. [Reference Asadi, Ebrahimi and Soofi6] and Rajesh and Sunoj [Reference Rajesh and Sunoj71] introduced an alternative measure of ${\rm CRE}_{Ts,\alpha }(F)$, as follows,

(20)\begin{equation} CRh_{\alpha }(F)=\frac{1}{\alpha -1}\int_{0}^{+\infty }(\bar{F}(x)-\bar{F}^{\alpha }(x)) dx,\quad \alpha >0,\ \alpha \neq 1, \end{equation}

and letting $\alpha \rightarrow 1,$ $\lim _{\alpha \rightarrow 1}CRh_{\alpha }(F)={\rm CRE}(F)$, defined by (18). Moreover, it is easy to see that for $\alpha =2$, the entropy type functional $CRh_{2}(F)$ coincides with Gini's index, multiplied by the expected value of the random variable associated with $F$ (cf. [Reference Asadi, Ebrahimi and Soofi6] p. 1037). Shannon and other classic measures of entropy quantify uncertainty relative to the uniform distribution, as it was mentioned previously. This is not the case for cumulative residuals entropies, like that in (18). Following the exposition in Asadi et al. [Reference Asadi, Ebrahimi and Soofi6] p. 1030, the so-called by them generalized entropy functional, (20), is a measure of concentration of the distribution. That is, it is non-negative and equals zero if and only if the distribution is degenerate. Moreover, strictly positive values of ${\rm CRE}(F)$ in (18) does not indicate departure from the perfect concentration toward the perfect uncertainty about prediction of random outcomes from the distribution. Rao's et al. [Reference Rao, Chen, Vemuri and Wang73] measure is an example for making distinction between a measure of concentration and a measure of uncertainty (every measure of concentration is not necessarily a measure of uncertainty).

Some years later, Di Crescenzo and Longobardi [Reference Di Crescenzo and Longobardi31] define the cumulative entropy, in analogy with the cumulative residual entropy of Rao et al. [Reference Rao, Chen, Vemuri and Wang73]. The cumulative entropy is defined by

(21)\begin{equation} {\rm CE}(F)={-}\int_{0}^{+\infty }F(x)\ln F(x)dx, \end{equation}

where $F$ is the distribution function, associated to a non-negative random variable $X$. It is clear that ${\rm CRE}(F)\geq 0$ and ${\rm CE}(F)\geq 0$.

Chen et al. [Reference Chen, Kar and Ralescu24] and some years later Klein et al. [Reference Klein, Mangold and Doll46] and Klein and Doll [Reference Klein and Doll45], in their interesting papers, have unified and extended the cumulative residual entropy (18) and the cumulative entropy (21). Based on Klein and Doll [Reference Klein and Doll45] p. 8, the cumulative $\Phi ^{\ast }$ entropy is defined by,

(22)\begin{equation} {\rm CE}_{\Phi^{{\ast} }}(F)=\int_{-\infty }^{+\infty }\Phi^{{\ast} }(F(x))dx, \end{equation}

where $\Phi ^{\ast }$ is a general concave entropy generating function such that $\Phi ^{\ast }(u)=\varphi (1-u)$ or $\Phi ^{\ast }(u)=\varphi (u)$ leads, respectively, to the cumulative residual $\varphi$ entropy and the cumulative $\varphi$ entropy. The entropy generating function $\varphi$ is a non-negative and concave real function defined on $[0,1]$. The above measure is analogous with Burbea and Rao's [Reference Burbea and Rao20] $\phi$-entropy $\mathcal {E }_{\phi }(f)$, defined in (4). It is, moreover, clear that ${\rm CRE}(F)$, in (18), and ${\rm CE}(F)$, in (21), are, respectively, special cases of ${\rm CE}_{\Phi ^{\ast }}(F)$, in (22), for $\Phi ^{\ast }(u)=\varphi (1-u)$ or $\Phi ^{\ast }(u)=\varphi (u)$, with $\varphi (x)=-x\ln x$, $x\in (0,1]$. The cumulative $\Phi ^{\ast }$ entropy ${\rm CE}_{\Phi ^{\ast }}(F)$, inspired by Klein and Doll [Reference Klein and Doll45], is a broad family of measures of cumulative residual entropy and cumulative entropy and special choices of the concave function $\varphi$ lead to interesting particular entropies, like that appeared in Table 3 of p. 13, in Klein and Doll [Reference Klein and Doll45]. An interesting special case of (22) is obtained for $\Phi ^{\ast }(x)=\varphi (x)=(x^{\alpha }-x)/(1-a),$ $x\in (0,1],\alpha >0,\alpha \neq 1$, a concave function that leads the cumulative $\Phi ^{\ast }$ entropy in (22) to coincide with the entropy type measure (20), above, of Asadi et al. [Reference Asadi, Ebrahimi and Soofi6] and the measure of equation (6) in the paper by Rajesh and Sunoj [Reference Rajesh and Sunoj71].

In the way that classical Shannon entropy (1) has motivated the definition of the cumulative entropy (21), in a completely similar manner, Kullback and Leibler [Reference Kullback and Leibler49] divergence (6) has motivated the definition of the cumulative Kullback–Leibler information and the cumulative residual Kullback–Leibler information, by the work of Rao [Reference Rao72], Baratpour and Rad [Reference Baratpour and Rad12], Park et al. [Reference Park, Rao and Shin66] and the subsequent papers by Di Crescenzo and Longobardi [Reference Di Crescenzo and Longobardi32] and Park et al. [Reference Park, Alizadeh Noughabi and Kim67], among others. In these and other treatments, the cumulative Kullback–Leibler information and the cumulative residual Kullback–Leibler information are defined, respectively, by

(23)\begin{equation} {\rm CKL}(F:G)=\int_{\mathbb{R} }F(x)\ln \left(\frac{F(x)}{G(x)}\right) dx+\int_{ \mathbb{R} }[G(x)-F(x)]dx, \end{equation}

and

(24)\begin{equation} {\rm CRKL}(F:G)=\int_{ \mathbb{R} }\bar{F}(x)\ln \left(\frac{\bar{F}(x)}{\bar{G}(x)}\right) dx+\int_{ \mathbb{R} }[\bar{G}(x)-\bar{F}(x)]dx, \end{equation}

for two distribution functions $F$ and $G$ with respective survival functions $\bar {F}$ and $\bar {G}$. It is clear that if the random quantities $X$ and $Y$, associated with $F$ and $G$, are non-negative, then $\int _{0}^{+\infty }\bar {F}(x)dx$ and $\int _{0}^{+ \infty }\bar {G}(x)dx$ are equal to $E(X)$ and $E(Y)$, respectively. It should be mentioned at this point that Asadi et al. [Reference Ardakani, Ebrahimi and Soofi2] defined, in Subsection 3.2, a Kullback–Leibler type divergence function between two non-negative functions $P_{1}$ and $P_{2}$ which provides a unified representation of the measures (6), (23) and (24), with $P_{i},$ $i=1,2$, being probability density function, cumulative distribution function and survival function, respectively. Based on Asadi et al. [Reference Asadi, Ebrahimi and Soofi7], for non-negative random variables $X$ and $Y$, associated with $F$ and $G$, respectively,

$$\int_{0}^{+\infty }[G(x)-F(x)]dx=\int_{0}^{+\infty }[G(x)-1+1-F(x)]dx=\int_{0}^{+\infty }[\bar{F}(x)-\bar{G} (x)]dx=E(X)-E(Y),$$

and (23), (24) are simplified as follows,

(25)\begin{equation} {\rm CKL}(F:G)=\int_{0}^{+\infty }F(x)\ln \left(\frac{F(x)}{G(x)}\right) dx+[E(X)-E(Y)], \end{equation}

and

(26)\begin{equation} {\rm CRKL}(F:G)=\int_{0}^{+\infty }\bar{F}(x)\ln \left(\frac{ \bar{F}(x)}{\bar{G}(x)}\right) dx+[E(Y)-E(X)]. \end{equation}

Based on Baratpour and Rad [Reference Baratpour and Rad12] and Park et al. [Reference Park, Rao and Shin66],

(27)\begin{equation} {\rm CKL}(F:G)\geq 0,\quad {\rm CRKL}(F:G)\geq 0\text{ with equality if and only if } F(x)=G(x)\text{, a.e. }x. \end{equation}

This is an important property because (27) ensures that ${\rm CKL}(F:G)$ and ${\rm CRKL}(F:G)$ can be used, in practice, as pseudo distances between the underling probability distributions. More generally, non-negativity and identity of indiscernibles, formulated by (27), is a desirable property of each newly defined measure of divergence because it expands its horizon in applications in formulating and solving problems in statistics and probability theory, among many other potential areas. The counter-example, that follows, illustrates the necessity of the last integrals of the right-hand side of (23) and (24), so as (27) to be valid.

Example 1. The analogs of Kullback–Leibler divergence (6), in case of cumulative and survival functions, would be$\int _{-\infty }^{+\infty }$ $F(x)\ln ({F(x)}/{G(x)})dx$ or $\int _{-\infty }^{+\infty }\bar {F}(x)\ln ({\bar {F}(x)}/{\bar {G}(x)})dx$, respectively. Consider two exponential distributions with survival functions $\bar {F}(x)=e^{-\lambda x},$ $x>0,$ $\lambda >0$ and $\bar {G} (x)=e^{-\mu x},$ $x>0,$ $\mu >0$. Then, it is easy to see that,

$$\int_{0}^{+\infty }\bar{F}(x)\ln \frac{\bar{F}(x)}{ \bar{G}(x)}dx=\frac{\mu -\lambda }{\lambda^{2}}.$$

It is clear that for $\mu <\lambda$, the second quantity, formulated in terms of the survival functions, does not obey the non-negativity property (27). Moreover, for $\lambda =3$ and $\mu =2$, numerical integration leads to $\int _{0}^{+\infty }F(x)\ln ({F(x)}/{G(x)})dx=0.1807$, while for $\lambda =1$ and $\mu =2$, the same measure is negative, $\int _{0}^{+\infty }F(x)\ln ({F(x)}/{G(x)})dx=-0.4362$. Therefore, the analogs of Kullback–Leibler divergence (6), in case of cumulative and survival functions, do not always satisfy the non-negativity property which is essential for applications of a measure of divergence, in practice.

Moreover, this counter-example underlines that the analog of (8), in case of cumulative and survival functions, of the form

(28)\begin{equation} \int G(x)\phi \left(\frac{F(x)}{G(x)}\right) dx\quad \text{or}\quad \int \bar{G}(x)\phi \left(\frac{\bar{F}(x)}{\bar{G}(x)}\right) dx, \end{equation}

obtained from (8) by replacing the densities by cumulative distributions and survival functions does not always lead to non-negative measures, something which is a basic prerequisite for a measure of divergence. Baratpour and Rad [Reference Baratpour and Rad12] and Park et al. [Reference Park, Rao and Shin66] have defined Kullback–Leibler type cumulative and survival divergences, by (23) and (24), as the analog of Kullback–Leibler classic divergence (6), which should obey the non-negativity property. In this direction, they have exploited a well-known property of the logarithmic function, namely $x\ln ({x}/{y})\geq x-y$, for $x,y>0$, and they defined Kullback–Leibler type divergences (23) and (24) by moving the right-hand side of the logarithmic inequality to the left-hand side.

Continuing the critical review of the cumulative and survival divergences, the cumulative paired $\phi$-divergence of Definition 4, p. 26 in the paper by Klein et al. [Reference Klein, Mangold and Doll46], can be considered as an extension of the above divergences ${\rm CKL}(F:G)$ and ${\rm CRKL}(F:G)$, defined by (23) and (24), in a completely similar manner that the survival and cumulative entropies (18) and (21) have been unified and extended to the cumulative $\Phi ^{\ast }$ entropy, given by (22). Working in this direction, Klein et al. [Reference Klein, Mangold and Doll46] p. 26 of 45, in their recent paper, have defined the cumulative paired $\phi$-divergence for two distributions, by generalizing the cross entropy of Chen et al. [Reference Chen, Kar and Ralescu24] p. 56, as follows,

(29)\begin{equation} {\rm CPD}_{\phi }(F:G)=\int_{-\infty }^{+\infty }\left(G(x)\phi \left(\frac{F(x)}{G(x)}\right) +\bar{G}(x)\phi \left(\frac{\bar{F}(x)}{ \bar{G}(x)}\right) \right) dx, \end{equation}

where $F$ and $G$ are distribution functions and $\bar {F}=1-F$, $\bar {G}=1-G$ are the respective survival functions. $\phi$ is again a real convex function defined on $[0,\infty ]$ with $\phi (0)=\phi (1)=0$ and satisfying additional conditions, like that of the class $\Phi$ in (9) (cf. [Reference Klein, Mangold and Doll46]). Klein et al. [Reference Klein, Mangold and Doll46] have discussed several properties of ${\rm CPD}_{\phi }(F:G)$ and they have been presented particular measures, obtained from ${\rm CPD}_{\phi }(F:G)$ for special cases of the convex function $\phi$. A particular case is that obtained for $\phi (u)=u\ln u,u>0$, and the resulting cumulative paired Shannon divergence

(30)\begin{equation} {\rm CPD}_{S}(F:G)=\int_{-\infty }^{+\infty }\left(F(x)\ln \left(\frac{ F(x)}{G(x)}\right) +\bar{F}(x)\ln \left(\frac{\bar{F}(x)}{ \bar{G}(x)}\right) \right) dx. \end{equation}

This measure is the cross entropy, introduced and studied previously in the paper by Chen et al. [Reference Chen, Kar and Ralescu24] and it is also considered in the paper by Park et al. [Reference Park, Alizadeh Noughabi and Kim67] under the terminology general cumulative Kullback–Leibler (GCKL) information. ${\rm CPD}_{S}(F:G)$, defined by (30), obeys the non-negativity and identity of indiscernibles property, similar to that of Eq. (27) (cf. [Reference Chen, Kar and Ralescu24]). However, there is not a rigorous proof of non-negativity of ${\rm CPD}_{\phi }(F:G)$, defined by (29), to the best of our knowledge. The non-negativity property is quite necessary for a measure of divergence between probability distributions as it supports and justifies the use of such a measure as a measure of quasi-distance between the respective probability distributions, and hence, this property makes up the benchmark in developing information theoretic methods in statistics. The cumulative divergences of (29) and (30) depend on both, the cumulative function and the survival function. However, this dependence on both functions may cause problems, in practice, in cases where one of the two functions is not so tractable. Exactly this notion, that is a possible inability of the above divergences to work in practice in cases of complicated cumulative or survival functions, was the motivation point in order to try to define Csiszár's type cumulative and survival $\phi$-divergence in a complete analogy to the classic divergence of Csiszár, defined by (8).

4. Csiszár's $\phi$-divergence type cumulative and survival measures

The main aim of the section is to introduce Csiszár's type $\phi$-divergence where the cumulative distribution function and the survival function will be used in place of probability density functions in (8). To proceed in this direction, a first thought is to define a Csiszár's type $\phi$-divergence that resembles (8), by replacing the densities $f$ and $g$ by the respective distributions $F$ and $G$ or the respective survival functions $\bar {F}$ and $\bar {G}$. However, such a clear reasoning does not always lead to divergences which obey, in all the cases, the non-negativity property, as it was shown in the previous motivating counter-example. To overcome this problem, motivated by the above described procedure of Baratpour and Rad [Reference Baratpour and Rad12] and Park et al. [Reference Park, Rao and Shin66] and the use by them of a classic logarithmic inequality, we will proceed to a definition of Csiszár's type cumulative and survival $\phi$-divergence, as a non-negative analog of the classic one defined by (8), by suitably applying the well-known Jensen inequality (cf., e.g., [Reference Niculescu and Persson60]). This is the theme on the next proposition.

To formulate Jensen's type inequality in the framework of cumulative and survival functions, following standard arguments (cf. [Reference Billingsley16]), consider the $d$-dimensional Euclidean space $\mathbb {R}^{d}$ and denote by $\mathcal {B}^{d}$ the $\sigma$-algebra of Borel subsets of $\mathbb {R}^{d}$. For two probability measures $P_{X}$ and $P_{Y}$ on $(\mathbb {R} ^{d},\mathcal {B}^{d})$ and two $d$-dimensional random vectors $X=(X_{1},\ldots,X_{d})$ and $Y=(Y_{1},\ldots,Y_{d})$, let $F$ and $G$ denote, respectively, the joint distribution functions of $X$ and $Y$, defined by, $F(x_{1},\ldots,x_{d})=P_{X}(X_{1}\leq x_{1},\ldots,X_{d}\leq x_{d})$ and $G(y_{1},\ldots,y_{d})=P_{Y}(Y_{1}\leq y_{1},\ldots,Y_{d}\leq y_{d})$, for $(x_{1},\ldots,x_{d})\in \mathbb {R}^{d}$ and $(y_{1},\ldots,y_{d})\in \mathbb {R}^{d}$. In a similar manner, the respective multivariate survival functions are defined by $\bar {F} (x_{1},\ldots,x_{d})=P_{X}(X_{1}>x_{1},\ldots,X_{d}>x_{d})$ and $\bar {G} (y_{1},\ldots,y_{d})=P_{Y}(Y_{1}>y_{1},\ldots,Y_{d}>y_{d})$. Let also a convex function $\phi$ defined in the interval $(0,+\infty )$ and satisfying the assumptions of p. 299 of Csiszár [Reference Csiszár29] (cf. also the class $\Phi$, defined by (9)). The next proposition formulates, in a sense, Lemma 1.1 on p. 299 of Csiszár [Reference Csiszár29] in terms of cumulative and survival functions.

Proposition 1.

  1. (a) Let $F$ and $G$ are two cumulative distribution functions. Then, for $\alpha =\int _{ \mathbb {R}^{d}}F(x)dx$ $/\int _{ \mathbb {R}^{d}}G(x)dx,$

    $$\int_{\mathbb{R}^{d}}G(x)\phi \left(\frac{F(x)}{G(x)}\right) dx\geq \phi (\alpha ) \int_{\mathbb{R}^{d}}G(x)dx,$$
    and the sign of equality holds if $F(x)=G(x)$, on $\mathbb {R}^{d}$. Moreover, if $\phi$ is strictly convex at $\alpha =\int _{ \mathbb {R}^{d}}F(x)dx$ $/\int _{ \mathbb {R}^{d}}G(x)dx$ and equality holds in the above inequality, then $F(x)=\alpha G(x)$, on $\mathbb {R}^{d}$.
  2. (b) Let $\bar {F}$ and $\bar {G}$ denote two survival functions. Then, for $\bar {\alpha }=\int _{ \mathbb {R}^{d}}\bar {F}(x)dx/\int _{\mathbb {R}^{d}}\bar {G}(x)dx,$

    $$\int_{\mathbb{R}^{d}}\bar{G}(x)\phi \left(\frac{\bar{F}(x)}{\bar{G}(x)} \right) dx\geq \phi \left(\bar{\alpha}\right) \int_{ \mathbb{R}^{d}}\bar{G}(x)dx,$$
    and the sign of equality holds if $\bar {F}(x)=\bar {G}(x)$, on $\mathbb {R}^{d}$. Moreover, if $\phi$ is strictly convex at $\bar {\alpha }=\int _{ \mathbb {R}^{d}}\bar {F}(x)dx/\int _{ \mathbb {R}^{d}}\bar {G}(x)dx$ and equality holds in the above inequality, then $\bar {F}(x)=\bar {\alpha }\bar {G}(x)$, on $\mathbb {R}^{d}$.

Proof. The proof is based on the proof of the classic Jensen's inequality and it closely follows the proof of Lemma 1.1 of Csiszár [Reference Csiszár29] p. 300. It is presented here in the context of distribution and survival functions for the sake of completeness. We will present the proof of part (a) because the proof of part (b) is quite similar and it is omitted. Following Csiszár [Reference Csiszár29] p. 300, first, one may assume that $\int _{ \mathbb {R}^{d}}F(x)dx>0$ and $\int _{ \mathbb {R}^{d}}G(x)dx>0$. Otherwise, the statement is true because of the conventions which define the class $\Phi$ of the convex functions $\phi$, in (9). By the convexity of $\phi$, it is valid

$$\phi (u)\geq \phi (\alpha)+b(u-\alpha),\quad 0< u<{+}\infty,$$

with $b,$ $b<+\infty,$ equals, for example, to the arithmetic mean of the right and left derivatives of $\phi (u)$ at the point $\alpha =\int _{ \mathbb {R}^{d}}F(x)dx/\int _{\mathbb {R}^{d}}G(x)dx$. Replacing $u=$ $F(x)/G(x)$, we obtain for $G(x)>0,$

(31)\begin{equation} G(x)\phi \left(\frac{F(x)}{G(x)}\right) \geq G(x)\phi (\alpha)+b(F(x)-\alpha G(x)). \end{equation}

According to the conventions that define the class $\Phi$, the above inequality holds even for $G(x)=0$, because the convexity of $\phi$ leads to $b\leq \lim _{u\rightarrow \infty }{\phi (u)}/{u}$. Integrating both sides of (31) over $\mathbb {R}^{d}$, it is obtained

$$\int_{\mathbb{R}^{d}}G(x)\phi \left(\frac{F(x)}{G(x)}\right) dx\geq \phi \left(\alpha \right) \int_{\mathbb{R}^{d}}G(x)dx,$$

and the first part of the assertion in (a) has been proved because of $\int _{ \mathbb {R}^{d}}(F(x)-\alpha G(x))dx=0$ by the definition of $\alpha$.

Suppose now that $\phi$ is strictly convex at $\alpha =\int _{ \mathbb {R}^{d}}F(x)dx/\int _{ \mathbb {R}^{d}}G(x)dx$ and that $\int _{ \mathbb {R}^{d}}G(x)\phi ({F(x)}/{G(x)}) dx$ $\geq (\int _{ \mathbb {R}^{d}}G(x)dx) \phi (\int _{ \mathbb {R} ^{d}}F(x)dx/\int _{ \mathbb {R}^{d}}G(x)dx)$. Taking into account that $\phi$ is strictly convex at $u=\alpha$, the inequality in (31) is strict, except for $F(x)=\alpha G(x)$.

The above result provides with lower bounds for the integrals (28), which constitute straightforward analogs of Csiszár's $\phi$-divergence, defined by (8). The said lower bounds, if they will be moved on the left-hand side of the inequalities of the previous proposition, can be exploited in order to define, in the sequel, non-negative Csiszár's type $\phi$-divergences by means of cumulative distribution functions and survival functions.

Definition 1. Let the cumulative distribution functions $F$ and $G$. The cumulative Csiszár's type $\phi$-divergence between $F$ and $G$ is defined by

(32)\begin{equation} \mathcal{CD}_{\phi }(F:G)=\int_{ \mathbb{R}^{d}}G(x)\phi \left(\frac{F(x)}{G(x)}\right) dx-\left(\int_{ \mathbb{R}^{d}}G(x)dx\right) \phi \left(\frac{\int_{ \mathbb{R}^{d}}F(x)dx}{\int_{ \mathbb{R}^{d}}G(x)dx}\right), \end{equation}

where $\phi :(0,\infty )\rightarrow \mathbb {R}$ is a real-valued convex function and $\phi \in \Phi,$ defined by (9).

Definition 2. Let the survival functions $\bar {F}$ and $\bar {G}$. The survival Csiszár's type $\phi$-divergence between $\bar {F}$ and $\bar {G}$ is defined by

(33)\begin{equation} \mathcal{SD}_{\phi }(\bar{F}:\bar{G})=\int_{ \mathbb{R}^{d}}\bar{G}(x)\phi \left(\frac{\bar{F}(x)}{\bar{G}(x)} \right) dx-\left(\int_{ \mathbb{R}^{d}}\bar{G}(x)dx\right) \phi \left(\frac{\int_{ \mathbb{R}^{d}}\bar{F}(x)dx}{\int_{ \mathbb{R}^{d}}\bar{G}(x)dx}\right), \end{equation}

where $\phi :(0,\infty )\rightarrow \mathbb {R}$ is a real-valued convex function and $\phi \in \Phi,$ defined by (9).

However, the main aim is the definition of Csiszár's type $\phi$-divergences on the basis of distribution and survival functions, which will obey the non-negativity and the identity of indiscernibles property, a property which will support applications of the proposed measures as quasi-distances between distributions. The quantities $\mathcal {CD}_{\phi }(F,G)$ and $\mathcal {SD}_{\phi }(\bar {F},\bar {G})$, defined above, are non-negative in view of the previous proposition. It remains to prove the identity of indiscernibles property which is the theme of the next proposition.

Proposition 2. The measures $\mathcal {CD}_{\phi }(F,G)$ and $\mathcal {SD}_{\phi }(\bar {F},\bar {G})$, defined by (32) and (33), obey the non-negativity and the identity of indiscernibles property, that is,

\begin{align*} & \mathcal{CD}_{\phi }(F:G)\geq 0\text{ with equality if and only if } F(x)=G(x),\quad \text{on } \mathbb{R}^{d}, \\ & \mathcal{SD}_{\phi }(\bar{F}:\bar{G})\geq 0\text{ with equality if and only if }\bar{F}(x)=\bar{G}(x),\quad \text{on } \mathbb{R}^{d}, \end{align*}

for the convex function $\phi$ being strictly convex at the points $\alpha$ and $\bar {\alpha }$ of the previous proposition.

Proof. If $F(x)=G(x),$ on $\mathbb {R}^{d}$, then the assertion follows from the fact that the convex function $\phi$ belongs to the class $\Phi$ and therefore $\phi (1)=0$. For functions $\phi$ which are strictly convex at $\alpha$ and $\bar {\alpha }$, if the sign of equality holds in the inequalities of parts (a) and (b) of the previous proposition, then $F(x)=\alpha G(x)$, on $\mathbb {R}^{d}$ and $\bar {F}(x)=\bar {\alpha }\bar {G}(x)$, on $\mathbb {R}^{d}$. Given that $F$ and $G$ are cumulative distribution functions and based on Billingsley [Reference Billingsley16] p. 260, $F(x)\rightarrow 1$ and $G(x)\rightarrow 1,$ if $x_{i}\rightarrow +\infty$ for each $i$ and $F(x)\rightarrow 0,$ $G(x)\rightarrow 0$, if $x_{i}\rightarrow -\infty$ for some $i$ (the other coordinates held fixed). Moreover, taking into account that the multivariate survival functions $\bar {F}$ and $\bar {G}$ are functionally related with the corresponding cumulative functions $F$ and $G$ (cf. [Reference Joe42] p. 27), we conclude that $\bar {F}(x)\rightarrow 1,$ $\bar {G} (x)\rightarrow 1$, if $x_{i}\rightarrow -\infty$ for each $i$. All these relationships between cumulative and survival functions lead to the conclusion that $\alpha =\bar {\alpha }=1$ and then they are coincide, that is, $F(x)=G(x)$ and $\bar {F}(x)=\bar {G}(x)$, on $\mathbb {R}^{d}$. Therefore, the lower bounds, derived in the proposition, are attained if and only if the underlined cumulative and survival functions coincide for the convex function $\phi$ being strictly convex at the points $\alpha$ and $\bar {\alpha }$. This completes the proof of the proposition.

In the sequel, the interest is focused in product measures, obtained from (32) and (33) for particular choices of the convex function $\phi$.

4.1. Kullback–Leibler type cumulative and survival divergences and mutual information

At a first glance, if Csiszar's type cumulative and survival divergences, defined by (32) and (33) above, will be applied for the convex function $\phi (u)=u\log u$, $u>0$, or $\phi (u)=u\log u+u-1$, $u>0,$ then they do not lead to the respective Kullback–Leibler divergences, defined by (23) and (24). An application of (32) and (33) for $\phi (u)=u\ln u$, $u>0$, leads to the measures

(34)\begin{equation} \begin{aligned} & \mathcal{CD}_{{\rm KL}}(F:G)=\int_{ \mathbb{R}^{d}}F(x)\ln \left(\frac{F(x)}{G(x)}\right) dx-\left(\int_{ \mathbb{R}^{d}}F(x)dx\right) \ln \left(\int_{ \mathbb{R}^{d}}F(x)dx/\int_{\mathbb{R}^{d}}G(x)dx\right), \\ & \mathcal{SD}_{{\rm KL}}(\bar{F}:\bar{G})=\int_{ \mathbb{R}^{d}}\bar{F}(x)\ln \left(\frac{\bar{F}(x)}{\bar{G}(x)} \right) dx-\left(\int_{ \mathbb{R}^{d}}\bar{F}(x)dx\right) \ln \left(\int_{ \mathbb{R}^{d}}\bar{F}(x)dx/\int_{ \mathbb{R}^{d}}\bar{G}(x)dx\right), \end{aligned} \end{equation}

respectively. Based on the elementary logarithmic inequality, $x\ln ({x}/{y})\geq x-y$, for $x,y>0$, and on Eqs. (23) and (24) it is immediate to see that

(35)\begin{equation} \begin{aligned} & \mathcal{CD}_{{\rm KL}}(F:G)\leq \int_{ \mathbb{R}^{d}}F(x)\ln \left(\frac{F(x)}{G(x)}\right) dx+\int_{ \mathbb{R}^{d}}[G(x)-F(x)]dx={\rm CKL}(F:G), \\ & \mathcal{SD}_{{\rm KL}}(\bar{F}:\bar{G})\leq \int_{ \mathbb{R}^{d}}\bar{F}(x)\ln \left(\frac{\bar{F}(x)}{\bar{G}(x)} \right) dx+\int_{ \mathbb{R}^{d}}[\bar{G}(x)-\bar{F}(x)]dx={\rm CRKL}(F:G), \end{aligned} \end{equation}

where ${\rm CKL}(F:G)$ and ${\rm CRKL}(F:G)$ are the measures (23) and (24) defined by Rao [Reference Rao72], Baratpour and Rad [Reference Baratpour and Rad12], Park et al. [Reference Park, Rao and Shin66], among others. It is clear, in view of (35), that the measures ${\rm CKL}(F:G)$ and ${\rm CRKL}(F:G)$ over evaluate the divergence or the quasi-distance between the distribution of two random variables, as it is formulated and expressed by the respective Kullback–Leibler type cumulative distribution functions or the survival functions, defined by (34).

Csiszár's type and Kullback–Leibler's type survival divergences can be expressed in terms of expected values if we restrict ourselves to the univariate case, $d=1$. Indeed, if we focus again on non-negative random variables $X$ and $Y$ with respective survival functions $\bar {F}$ and $\bar {G}$, then $\mathcal {SD}_{\phi }(\bar {F},\bar {G})$ of (33) is formulated as follows:

(36)\begin{equation} \mathcal{SD}_{\phi }(\bar{F}:\bar{G})=\int_{0}^{+\infty } \bar{G}(x)\phi \left(\frac{\bar{F}(x)}{\bar{G}(x)}\right) dx-\left(EY\right) \phi \left(\frac{EX}{EY}\right), \end{equation}

and for the special choice $\phi (u)=u\log u,u>0$, (34) leads to

(37)\begin{equation} \mathcal{SD}_{{\rm KL}}(\bar{F}:\bar{G})=\int_{0}^{+\infty } \bar{F}(x)\ln \left(\frac{\bar{F}(x)}{\bar{G}(x)}\right) dx-\left(EX\right) \ln \left(\frac{EX}{EY}\right). \end{equation}

It should be noted at this point that Asadi et al. [Reference Asadi, Ebrahimi and Soofi6], in their Lemma 2.1, formulated a general divergence measure by moving the right-hand side of their inequality to the left-hand side. The defined by Asadi et al. [Reference Asadi, Ebrahimi and Soofi6] measure includes the divergence in (37) as a limiting case.

The survival analogs of Csiszár's and Kullback–Leibler's divergences can be expressed in terms of expected values, in view of (36) or (37). The implication of this point is shown in the next example.

Example 2. Park et al. [Reference Park, Rao and Shin66] considered the standard exponential distribution $\bar {F}(x)=e^{-x},x>0$ and the Weibull distribution $\bar {G} (x)=e^{-x^{k}}$, $x>0$, $k>0$, with scale parameter $1$ and shape parameter $k$. It is well known that the mean of these distributions exist and they are equal to $E(X)=1$ and $E(Y)=\Gamma (1+{1}/{k})$, where $\Gamma$ denotes the complete gamma function. In this context, based on (25), Park's et al. [Reference Park, Rao and Shin66] cumulative Kullback–Leibler information (${\rm CKL}$) between $F$ and $G$ can be easily obtained because $E(X)-E(Y)=1-\Gamma (1+{1}/{k})$, while the integral $\int _{0}^{+ \infty }F\ln (F/G)dx$ can be numerically evaluated for specific values of the shape parameter $k>0$. On the other hand, elementary algebraic manipulations lead that

$$\int_{0}^{+\infty }\bar{F}(x)\ln \frac{\bar{F}(x)}{ \bar{G}(x)}dx={-}E(X)+E(X^{k})={-}1+k!={-}1+\Gamma (k+1),$$

by taking into account that the simple moment of order $k$ of the standard exponential distribution is $E(X^{k})=k!=\Gamma (k+1)$. Therefore, based on (26), Park's et al. [Reference Park, Rao and Shin66] cumulative residual KL information (${\rm CRKL}$) between $\bar {F}$ and $\bar {G}$ is given by

$${\rm CRKL}(F:G)={-}1+k!+\Gamma \left(1+\frac{1}{k}\right) -1=\Gamma (k+1) +\Gamma \left(1+\frac{1}{k}\right) -2.$$

Let's now derive the measures $\mathcal {CD}_{{\rm KL}}(F:G)$ and $\mathcal {SD}_{{\rm KL}}(\bar {F}:\bar {G}),$ formulated by (34) or (37) for non-negative random variables, as is the case of the random variables $X$ and $Y$, above. It is easy to see that the integrals $\int _{0}^{+\infty }F(x)dx$ and $\int _{0}^{+\infty }G(x)dx$ do not convergence, and therefore, $\mathcal {CD}_{{\rm KL}}(F:G)$ in (34) is not defined for this specific choice of $F$ and $G$. On the other hand, $\mathcal {SD}_{{\rm KL}}(\bar {F}:\bar {G})$ is derived in a explicit form by using (37) and it is given by

$$\mathcal{SD}_{{\rm KL}}(\bar{F}:\bar{G})=\int_{0}^{+\infty } \bar{F}(x)\ln \frac{\bar{F}(x)}{\bar{G}(x)}dx-(EX)\ln \frac{EX }{EY}=\int_{0}^{+\infty }\bar{F}(x)\ln \frac{\bar{F}(x)}{ \bar{G}(x)}dx+\ln (EY),$$

or

$$\mathcal{SD}_{{\rm KL}}(\bar{F},\bar{G})={-}1+\Gamma (k+1) +\ln \Gamma \left(1+\frac{1}{k}\right).$$

The classic Kullback–Leibler divergence between the standard exponential distribution with density function $f(x)=e^{-x},x>0$ and the Weibull distribution with scale parameter equal to $1$ and density function $g(x)=kx^{k-1}e^{-x^{k}},x>0,k>0$, is defined by

$$\mathcal{D}_{0}(f:g)=\int_{0}^{+\infty }f(x)\ln \frac{f(x)}{g(x)}dx.$$

Simple algebraic manipulations lead to

\begin{align*} \int_{0}^{+\infty }f(x)\ln f(x)dx& ={-}\int_{0}^{+\infty }xf(x)dx={-}E(X)={-}1,\\ \int_{0}^{+\infty }f(x)\ln g(x)dx& =\int_{0}^{+\infty }e^{{-}x}(\ln k+(k-1)\ln x-x^{k}) dx\\ & =\ln k+(k-1)E_{f}(\ln X)-E_{f}(X^{k}). \end{align*}

Taking into account that $E_{f}(X^{k})=k!=\Gamma (k+1)$ and $\int _{0}^{+\infty }e^{-x}(\ln x) dx= -0.57722,$

$$\int_{0}^{+\infty }f(x)\ln g(x)dx=\ln k-0.577\,22(k-1)-\Gamma (k+1),$$

and hence

$$\mathcal{D}_{0}(f,g)={-}1-\ln k+0.577\,22(k-1)+\Gamma (k+1).$$

Figure 1 shows the plot of $\mathcal {D}_{0}(f:g)$ (red-solid), ${\rm CRKL}(F:G)$ (brown-dots) and $\mathcal {SD}_{{\rm KL}}(\bar {F}:\bar {G})$ (blue-dash).

FIGURE 1. Plot of divergences $\mathcal {D}_{0}$ (red-solid), ${\rm CRKL}$ (brown-dots) and $\mathcal {SD}_{{\rm KL}}$ (blue-dash).

We observe from this figure that all the considered divergences attain their minimum value $0$ at $k=1$ because in this case the standard exponential model coincides with the Weibull model with scale parameter and shape parameter equal to one. For values of $k$ greater than $1$, all the measures almost coincide. The plots are also in harmony with inequality (35).

Mutual information is closely related and it is obtained from Kullback–Leibler divergence, defined by (6). It has its origins, to the best of our knowledge, in a paper by Linfoot [Reference Linfoot54] and it has received a great attention in the literature as it has been connected with a huge literature on topics of statistical dependence. It has been used for the definition of measures of dependence, which have been broadly applied to develop tests of independence (cf., [Reference Blumentritt and Schmid17,Reference Cover and Thomas26,Reference Ebrahimi, Jalali and Soofi33,Reference Guha, Biswas and Ghosh38,Reference Micheas and Zografos57], among many others). Mutual information is, in essence, the Kullback–Leibler divergence $\mathcal {D}_{0},$ in (6), between the joint distribution of $d$ random variables and the distribution of these random variables subject to the assumption of their independence.

In this context, consider the $d$-dimensional Euclidean space $\mathbb {R}^{d}$ and denote by $\mathcal {B}^{d}$ the $\sigma$-algebra of Borel subsets of $\mathbb {R}^{d}$. For a probability measure $P_{X}$ on $(\mathbb {R}^{d},\mathcal {B}^{d})$ and a $d$-dimensional random vector $X=(X_{1},\ldots,X_{d})$, let $F_{X}$ be the joint distribution function of $X$, defined by, $F_{X}(x_{1},\ldots,x_{d})=P_{X}(X_{1}\leq x_{1},\ldots,X_{d}\leq x_{d})$. Let now denote by $P_{X}^{0}$ the probability measure on $(\mathbb {R}^{d},\mathcal {B}^{d})$ under the assumption of independence of the components $X_{i}$ of the random vector $X=(X_{1},\ldots,X_{d})$, that is $P_{X}^{0}$ is product measure $P_{X}^{0}=P_{X_{1}}\times...\times P_{X_{d}}$, where $P_{X_{i}}$ are probability measures on $(\mathbb {R},\mathcal {B})$ and $F_{X_{i}}(x_{i})=P_{X_{i}}(X_{i}\leq x_{i})$, $x_{i}\in \mathbb {R}$, is the marginal distribution function of $X_{i}$, $i=1,\ldots,d$. In this setting, the joint distribution of $X=(X_{1},\ldots,X_{d})$, under the assumption of independence, is defined by $F_{X}^{0}(x_{1},\ldots,x_{d})=\prod _{i=1}^{d}F_{X_{i}}(x_{i})$, for $(x_{1},\ldots,x_{d})\in \mathbb {R}^{d}$. If $f_{X}$ and $f_{X}^{0}$ are the respective joint densities of $X=(X_{1},\ldots,X_{d})$, then the classic mutual information is defined by

(38)\begin{equation} \mathcal{MI}(X)=\mathcal{D}_{0}(f_{X}:f_{X}^{0})=\int_{ \mathbb{R}^{d}}f_{X}(x)\ln \frac{f_{X}(x)}{f_{X}^{0}(x)}dx=\int_{ \mathbb{R}^{d}}f_{X}(x)\ln \frac{f_{X}(x)}{f_{X_{1}}(x_{1})\ldots f_{X_{d}}(x_{d})}dx. \end{equation}

The measure (38) satisfies the non-negativity and identity of indiscernibles property, $\mathcal {MI}(X)\geq 0$, with equality if and only if $f_{X}(x)=$ $\prod _{i=1}^{d}f_{X_{i}}(x_{i})$, that is, if and only if $X_{1},\ldots,X_{d}$ are independent. Hence, the above measure is ideal to formulate the degree of stochastic dependence between the components of $X=(X_{1},\ldots,X_{d})$ and to serve, therefore, as a measure of stochastic dependence. An empirical version of (38) can be also used as a test statistic in testing independence of the components of $X=(X_{1},\ldots,X_{d})$.

Mutual information can be defined in terms of cumulative and survival functions by using $\mathcal {CD}_{{\rm KL}}$ and $\mathcal {SD}_{{\rm KL}}$ of (34). Then, the cumulative mutual information and the survival mutual information are defined by,

(39)\begin{equation} \begin{aligned} & \mathcal{CMI}(X)=\int_{ \mathbb{R}^{d}}F_{X}(x)\ln \left(\frac{F_{X}(x)}{F_{X}^{0}(x)}\right) dx-\left(\int_{ \mathbb{R}^{d}}F_{X}(x)dx\right) \ln \left(\int_{ \mathbb{R}^{d}}F_{X}(x)dx\,\left/\int_{ \mathbb{R}^{d}}F_{X}^{0}(x)dx\right.\right), \\ & \mathcal{SMI}(X)=\int_{ \mathbb{R}^{d}}\bar{F}_{X}(x)\ln \left(\frac{\bar{F}_{X}(x)}{\bar{F}_{X}^{0}(x)}\right) dx-\left(\int_{ \mathbb{R}^{d}}\bar{F}(x)dx\right) \ln \left(\int_{ \mathbb{R}^{d}}\bar{F}(x)dx\,\left/\int_{ \mathbb{R}^{d}}\bar{F}_{X}^{0}(x)dx\right.\right), \end{aligned} \end{equation}

where $F_{X_{i}}$ is the marginal distribution function of $X_{i}$, $i=1,\ldots,d$, while $F_{X}^{0}(x)=\prod _{i=1}^{d}F_{X_{i}}(x_{i})$, $\bar {F}_{X}^{0}(x)=\prod _{i=1}^{d}[1-F_{X_{i}}(x_{i})],$ are used to denote the cumulative and the respective survival function under the assumption of independence of the components of $X=(X_{1},\ldots,X_{d})$. It is immediate to see that the cumulative and survival mutual information $\mathcal {CMI}(X)$ and $\mathcal {SMI}(X)$, of (39), are non-negative and they attain their minimum value $0$ if and only if $F_{X}(x)=F_{X}^{0}(x)=\prod _{i=1}^{d}F_{X_{i}}(x_{i})$ and $\bar {F}_{X}(x)=\bar {F}_{X}^{0}(x)=\prod _{i=1}^{d}[1-F_{X_{i}}(x_{i})]$. Hence, $\mathcal {CMI}(X)$ and $\mathcal {SMI}(X)$ express on how close is $F_{X}(x)$ with $\prod _{i=1}^{d}F_{X_{i}}(x_{i})$ and $\bar {F}_{X}(x)$ with $\prod _{i=1}^{d}[1-F_{X_{i}}(x_{i})]=\prod _{i=1}^{d}\bar {F}_{X_{i}}(x_{i})$, respectively. Based, moreover, on the fact that equality $F_{X}(x)=F_{X}^{0}(x)=\prod _{i=1}^{d}F_{X_{i}}(x_{i})$ is equivalent to independence of $X_{1},\ldots,X_{d}$, the cumulative mutual information $\mathcal {CMI}(X)$ can be also used as a measure of dependence and its empirical version can be also serve as an index to develop tests of independence. The same is true for the measure $\mathcal {SMI}(X)$ in the bivariate, $d=2$, case. Cumulative and survival mutual information $\mathcal {CMI}(X)$ and $\mathcal {SMI}(X)$ can be generalized, by using (32) and (33), to produce Csiszár's type cumulative and survival mutual $\phi$-divergences in a way, similar to that of Micheas and Zografos [Reference Balakrishnan and Lai11] who have extended the classic mutual information (38) to the classic Csiszár's $\phi$-divergence between the joint distribution and the similar one under the assumption of independence.

Next example presents the measures $\mathcal {CMI}(X)$ and $\mathcal {SMI}(X)$ on the basis of a well-known bivariate distribution, the Farlie-Gumbel-Morgenstern (FGM) bivariate distribution (cf. [Reference Balakrishnan and Lai11], and references therein).

Example 3. Let the FGM bivariate distribution of a random vector $(X_{1},X_{2}),$ with the joint cumulative distribution function,

$$F_{X_{1},X_{2}}(x_{1},x_{2})=x_{1}x_{2}[1+\theta (1-x_{1})(1-x_{2})],\quad 0< x_{1},x_{2}<1,\ -1\leq \theta \leq 1,$$

and the joint probability density function,

$$f_{X_{1},X_{2}}(x_{1},x_{2})=1+\theta (1-2x_{1})(1-2x_{2}),\quad 0< x_{1},x_{2}<1,\ -1\leq \theta \leq 1.$$

The marginal distributions are uniform $X_{1}\sim U(0,1)$ and $X_{2}\sim U(0,1)$ and the correlation coefficient is $\rho =\rho (X_{1},X_{2}) ={\theta }/{3},$ which clearly ranges from $-\frac {1}{ 3}$ to $\frac {1}{3}$. For FGM family of bivariate distributions, it can be easily seen that the last term of the right-hand side of (39) is obtained in an analytic form and it is given by,

$$\left(\int_{0}^{1}\int _{0}^{1}F_{X_{1},X_{2}}(x_{1},x_{2})dx_{1}dx_{2}\right) \ln \frac{ \int_{0}^{1}\int _{0}^{1}F_{X_{1},X_{2}}(x_{1},x_{2})dx_{1}dx_{2}}{ \int_{0}^{1}\int _{0}^{1}F_{X_{1}}(x_{1})F_{X_{2}}(x_{2})dx_{1}dx_{2}}=\left(\frac{1 }{4}+\frac{\theta }{36}\right) \ln \left(1+\frac{\theta }{9}\right).$$

The first term of the right-hand side of (39), $\int _{0}^{1} \int _{0}^{1}F_{X_{1},X_{2}}\ln ({F_{X_{1},X_{2}}}/{F_{X_{1}}F_{X_{2}}})dx_{1}dx_{2}$, and the classic mutual information (38) can be numerically evaluated for several values of the dependence parameter $\theta$, $-1\leq \theta \leq 1$. Moreover, based on Nelsen [Reference Nelsen59] p. 32 or Joe [Reference Joe42] pp. 27–28,

$$\bar{F}_{X_{1},X_{2}}(x_{1},x_{2})=1-F_{X_{1}}(x_{1})-F_{X_{2}}(x_{2})+F_{X_{1},X_{2}}(x_{1},x_{2}),$$

and therefore, the survival function of the FGM family of bivariate distributions is given by,

$$\bar{F}_{X_{1},X_{2}}(x_{1},x_{2})=(1-x_{1}-x_{2}+x_{1}x_{2})(1+\theta x_{1}x_{2}),0< x_{1},x_{2}<1\text{ and }-1\leq \theta \leq 1,$$

while the survival function $\bar {F}_{X_{1},X_{2}}^{0}(x_{1},x_{2})$, under the assumption of independence of $X_{1},$ $X_{2}$ is $\bar {F}_{X_{1},X_{2}}^{0}(x_{1},x_{2})=(1-x_{1})(1-x_{2})$. The table presents the values of the correlation coefficient $\rho (X_{1},X_{2})$, the mutual information $\mathcal {MI}(X_{1},X_{2})$, the cumulative and the survival mutual information $\mathcal {CMI}(X_{1},X_{2})$ and $\mathcal {SMI}(X_{1},X_{2})$, for several values of the dependence parameter $\theta$.

Observe that all the measures decrease and they approach their minimum value as the value of the dependence parameter $\theta$ approaches independence $(\theta =0)$. The correlation coefficient $\rho$ and the mutual information $\mathcal {MI}$ are symmetric, something which is not obeyed by the cumulative mutual information $\mathcal {CMI}$ and the survival mutual information $\mathcal {SMI}$. The correlation coefficient captures negative dependence but the other informational measures do not discriminate between positive and negative dependence. Last, it is interesting to note that the quantity $\int _{0}^{1}\int _{0}^{1}F_{X_{1},X_{2}}\ln ({F_{X_{1},X_{2}}}/{F_{X_{1}}F_{X_{2}}})dx_{1}dx_{2}$ can take positive or negative values, for instance, it is equal to $-0.00672$ for $\theta =-0.25$, or it is equal to $0.01471$ for $\theta =0.5$. The same is also true for the quantity $\int _{0}^{1}\int _{0}^{1}\bar {F}_{X_{1},X_{2}}\ln ({\bar {F}_{X_{1},X_{2}}}/{\bar {F}_{X_{1}} \bar {F}_{X_{2}}})dx_{1}dx_{2}$ which is equal to $-0.01866$ for $\theta =-0.75$ and equal to $0.01471$ for $\theta =0.5$. This point justifies the definition of the cumulative and survival mutual information by (39) which ensure non-negativity of the measures.

4.2. Cressie and Read type cumulative and survival divergences

Let now consider the convex function $\phi (u)=\phi _{\lambda }(u)= {(u^{\lambda +1}-u-\lambda (u-1))}/{\lambda (\lambda +1)}$, $\lambda \neq 0,-1$, $u>0$, which leads Csiszár's $\phi$-divergence, defined by (8), to Cressie and Read [Reference Cressie and Read27] and Read and Cressie [Reference Read and Cressie74] power divergence. A straight application of (32) and (33) for this specific choice of the convex function $\phi$ leads to the Cressie and Read type cumulative and survival divergences, which are defined as follows:

(40)\begin{align} \mathcal{CD}_{\lambda }(F:G) & =\frac{1}{\lambda (\lambda +1)}\left(\int_{ \mathbb{R}^{d}}G(x)\left(\frac{F(x)}{G(x)}\right)^{\lambda +1}dx-\left(\int_{ \mathbb{R}^{d}}G(x)dx\right) \right.\nonumber\\ & \quad \left.\times \left(\int_{ \mathbb{R}^{d}}F(x)dx\,\left/\int_{ \mathbb{R}^{d}}G(x)dx\right)^{\lambda +1}\right.\right), \end{align}

and

\begin{align*} \mathcal{SD}_{\lambda }(\bar{F}:\bar{G})& =\frac{1}{\lambda (\lambda +1)}\left(\int_{ \mathbb{R}^{d}}\bar{G}(x)\left(\frac{\bar{F}(x)}{\bar{G}(x)}\right)^{\lambda +1}dx-\left(\int_{ \mathbb{R}^{d}}\bar{G}(x)dx\right)\right.\\ & \quad \left. \times \left(\int_{ \mathbb{R}^{d}}\bar{F}(x)dx\,\left/\int_{ \mathbb{R}^{d}}\bar{G}(x)dx\right)^{\lambda +1}\right.\right), \end{align*}

for $\lambda \in \mathbb {R},\lambda \neq 0,-1$. The last measures can be formulated in terms of expected values if we concentrate on non-negative random variables $X$ and $Y$ with respective survival functions $\bar {F}$ and $\bar {G}$. In this frame, $\mathcal {SD}_{\lambda }(\bar {F},\bar {G})$ is simplified as follows,

$$\mathcal{SD}_{\lambda }(\bar{F}:\bar{G})=\frac{1}{\lambda (\lambda +1)}\left(\int_{0}^{+\infty }\bar{G}(x)\left(\frac{\bar{F}(x)}{\bar{G}(x)}\right)^{\lambda +1}dx-E(Y)\left(\frac{E(X)}{E(Y)} \right)^{\lambda +1}\right),\quad \lambda \in \mathbb{R},\ \lambda \neq 0,-1.$$

The previous propositions supply $\mathcal {CD}_{\lambda }(F,G)$ and $\mathcal {SD}_{\lambda }(\bar {F},\bar {G})$ with non-negativity and the identity of indiscernibles property. Cressie and Read's [Reference Cressie and Read27] type cumulative and survival divergences (40) are not defined for $\lambda =-1$ and $\lambda =0$. When the power $\lambda$ approaches these values, then $\mathcal {CD}_{\lambda }(F,G)$ and $\mathcal {SD}_{\lambda }(\bar {F},\bar {G})$ are reduced to the respective Kullback–Leibler divergences, in the limiting sense that follows and can be easily proved,

\begin{align*} & \lim_{\lambda \rightarrow 0}\mathcal{CD}_{\lambda }(F :G)=\mathcal{CD}_{{\rm KL}}(F:G)\quad \text{and}\quad \lim_{\lambda \rightarrow 0}\mathcal{SD}_{\lambda }(\bar{F}:\bar{G})=\mathcal{SD}_{{\rm KL}}(\bar{F}:\bar{G}),\\ & \lim_{\lambda \rightarrow -1}\mathcal{CD}_{\lambda }(F :G)=\mathcal{CD}_{{\rm KL}}(G:F)\quad \text{and}\quad \lim_{\lambda \rightarrow -1}\mathcal{SD}_{\lambda }(\bar{F}:\bar{G})=\mathcal{SD}_{{\rm KL}}(\bar{G}:\bar{F}). \end{align*}

Example 4. Let consider again the random variables $X$ and $Y$ of the previous example with distribution functions and survival functions $F(x)=1-e^{-x},$ $\bar {F}(x)=e^{-x}$ and $G(x)=1-e^{-x^{k}}$, $\bar {G} (x)=e^{-x^{k}},$ $x>0,k>0$. It is easy to see that $\mathcal {CD}_{\lambda }(F:G)$ is not obtained in an explicit form for this specific choice of $F$ and $G$. In respect to $\mathcal {SD}_{\lambda }(\bar {F}:\bar {G})$, elementary algebraic manipulations entail that

$$\int_{0}^{+\infty }\bar{G}(x)\left(\frac{\bar{F}(x)}{ \bar{G}(x)}\right)^{\lambda +1}dx=\int_{0}^{+\infty }e^{{-}x^{k}}\left(\frac{e^{{-}x}}{e^{{-}x^{k}}}\right)^{\lambda +1}dx=\int_{0}^{+\infty }e^{-(\lambda +1)x+\lambda x^{k}}dx,$$

and the last integral can be numerically evaluated for specific values of the power $\lambda$ and the shape parameter $k$. Taking into account that $E(X)=1$ and $E(Y)=\Gamma (1+(1/k))$, the Cressie and Read type survival divergence is given by,

$$\mathcal{SD}_{\lambda }(\bar{F}:\bar{G})=\frac{1}{\lambda (\lambda +1)}\left(\int_{0}^{+\infty }e^{-(\lambda +1)x+\lambda x^{k}}dx-\Gamma^{-\lambda }\left(1+\frac{1}{k}\right) \right),\quad\lambda \in \mathbb{R},\ \lambda \neq 0,-1,\ k>0.$$

4.3. Density power divergence type cumulative and survival divergences

A straightforward application of $d_{a}$, given by (12), leads to the cumulative and survival counterparts of $d_{a}$, which are defined in the sequel. Let $F$ and $G$ denote the cumulative distribution functions of the random vectors $X$ and $Y$ and $\bar {F}$ and $\bar {G}$ denote the respective survival functions. Then, the cumulative and survival density power type divergences, are defined by,

(41)\begin{equation} \mathcal{C}d_{a}(F:G)=\int_{\mathbb{R}^{d}}\left \{ G(x)^{1+a}-\left(1+\frac{1}{a}\right) G(x)^{a}F(x)+\frac{1}{a} \text{ }F(x)^{1+a}\right \} dx,\quad a>0, \end{equation}

and

$$\mathcal{S}d_{a}(\bar{F}:\bar{G})=\int_{ \mathbb{R}^{d}}\left \{ \bar{G}(x)^{1+a}-\left(1+\frac{1}{a}\right) \bar{G} (x)^{a}\bar{F}(x)+\frac{1}{a}\text{ }\bar{F}(x)^{1+a}\right \} dx, \quad a>0.$$

The above divergences, $\mathcal {C}d_{a}(F:G)$ and $\mathcal {S}d_{a}(\bar {F}:\bar {G})$, are non-negative, for all $a>0$. They are equal to $0$ if and only if the underline cumulative distributions $F$ and $G$, or the respective survival functions $\bar {F}$ and $\bar {G}$ are coincide. The proof of this assertion is immediately obtained by following the proof of Theorem 9.1 of Basu et al. [Reference Basu, Shioya and Park14]. It is seen that the case $a=0$ is excluded from the definition of $\mathcal {C}d_{a}(F:G)$ and $\mathcal {S}d_{a}(\bar {F}:\bar {G})$ in (41). It can be easily shown that $\lim _{a\rightarrow 0}\mathcal {C}d_{a}(F:G)={\rm CKL}(F:G)$ and $\lim _{a\rightarrow 0}\mathcal {S}d_{a}(\bar {F}:\bar {G})={\rm CRKL}(F:G)$, where the limiting measures ${\rm CKL}(F:G)$ and ${\rm CRKL}(F:G)$ have been defined by (23) and (24), respectively.

5. Fisher's type cumulative and survival information

Fisher's information measure, defined by (13), is a key expression which is connected with important results in mathematical statistics. It is related to the Kullback–Leibler divergence, in a parametric framework, and this relation is formulated in (16). A natural question is raised at this point: How would be defined Fisher's type measure in terms of a distribution function or in terms of a survival function? We will try to give an answer to this question motivated by the limiting connection of classic Csiszár's $\phi$-divergence and Fisher's information, formulated by (16) and (17). This is also based on similar derivations in Section 3 of Park et al. [Reference Park, Rao and Shin66] and the recent work by Kharazmi and Balakrishnan [Reference Kharazmi and Balakrishnan43]. To formulate the definition, consider the $d$-dimensional Euclidean space $\mathbb {R}^{d}$ and denote by $\mathcal {B}^{d}$ the $\sigma$-algebra of Borel subsets of $\mathbb {R}^{d}$. Let a parametric family of probability measures $P_{\theta }$ on $(\mathbb {R}^{d},\mathcal {B}^{d})$, depending on an unknown parameter $\theta$, belonging to the parameter space $\Theta \in \mathbb {R}$. For a $d$-dimensional random vector $X=(X_{1},\ldots,X_{d})$, let $F_{\theta }$ denotes the joint distribution function of $X$, with $F_{\theta }(x_{1},\ldots,x_{d})=P_{\theta }(X_{1}\leq x_{1},\ldots,X_{d}\leq x_{d})$ and let $\bar {F}_{\theta }$ denotes the joint survival function of $X$, with $\bar {F}_{\theta }(x_{1},\ldots,x_{d})=P_{\theta }(X_{1}>x_{1},\ldots,X_{d}>x_{d})$, for $(x_{1},\ldots,x_{d})\in \mathbb {R}^{d},\theta \in \Theta \in \mathbb {R}$.

Motivated by the limiting behavior, cf. (17), between the classic Fisher information and Csiszár's $\phi$-divergence, it is investigated, in the next proposition, an analogous behavior of the cumulative and survival Csiszár's type $\phi$-divergences, defined by (32) and (33).

Proposition 3. Let a parametric family of joint distribution functions $F_{\theta }(x)$, $x\in \mathbb {R}^{d}$ and $\theta \in \Theta \subseteq \mathbb {R}$. Let also $\bar {F}_{\theta }(x)$ be the corresponding survival function. Then, the cumulative and survival Csiszár's type $\phi$-divergences, defined by (32) and (33), are characterized by the following limiting behavior,

\begin{align*} \lim_{\delta \rightarrow 0}\frac{1}{\delta^{2}}\mathcal{CD}_{\phi }(F_{\theta +\delta },F_{\theta })& =\frac{\phi^{\prime \prime }(1)}{2}\left \{ \int_{ \mathbb{R}^{d}}F_{\theta }(x)\left(\frac{d}{d\theta }\ln F_{\theta }(x)\right)^{2}dx\right.\\ & \quad \left.-\left(\int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right)^{2}\right\},\\ \lim_{\delta \rightarrow 0}\frac{1}{\delta^{2}}\mathcal{SD}_{\phi }(\bar{F}_{\theta +\delta },\bar{F}_{\theta })& =\frac{\phi^{\prime \prime }(1)}{2}\left \{ \int_{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)\left(\frac{d}{d\theta }\ln \bar{F}_{\theta }(x)\right)^{2}dx\right.\\ & \quad \left.-\left(\int_{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int _{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)dx\right)^{2}\right \}, \end{align*}

for $\theta \in \Theta \subseteq \mathbb {R}$ and subject to the additional assumptions $\phi ^{\prime }(1)=0$, for $\phi \in \Phi$, defined by (9).

Proof. We will only sketch the proof for $\mathcal {CD}_{\phi }$ because the other one follows in a completely similar manner. Following Pardo [Reference Pardo65] p. 411, for $w(\theta +\delta,\theta )=\int F_{\theta }\phi (F_{\theta +\delta }/F_{\theta })dx$, a second-order Taylor expansion of $w(\theta ^{\ast },\theta )$ around $\theta ^{\ast }=\theta$ at $\theta ^{\ast }=\theta +\delta$ gives

(42)\begin{align} w(\theta +\delta,\theta) & =w(\theta,\theta)+(\theta +\delta -\theta) \frac{d}{d\theta^{{\ast} }}w(\theta^{{\ast} },\theta)|_{\theta^{{\ast} }=\theta } \nonumber\\ & \quad +\frac{1}{2}(\theta +\delta -\theta)^{2}\frac{d^{2}}{d(\theta^{{\ast} })^{2}}w(\theta^{{\ast} },\theta)|_{\theta^{{\ast} }=\theta }+O(\delta^{3}), \end{align}

where $w(\theta,\theta )=0$ in view of $\phi (1)=0$ and

(43)\begin{equation} \frac{d}{d\theta^{{\ast} }}w(\theta^{{\ast} },\theta)|_{\theta^{{\ast} }=\theta }=\left. \int \phi^{\prime }\left(\frac{F_{\theta^{{\ast} }}(x)}{ F_{\theta }(x)}\right) \frac{d}{d\theta^{{\ast} }}F_{\theta^{{\ast} }}(x)dx\right \vert_{\theta^{{\ast} }=\theta }=\phi^{\prime }(1)\int \frac{ d}{d\theta }F_{\theta }(x)dx=0, \end{equation}

taking into account that $\phi ^{\prime }(1)=0$. On the other hand,

\begin{align*} \frac{d^{2}}{d(\theta^{{\ast} })^{2}}w(\theta^{{\ast} },\theta)|_{\theta^{{\ast} }=\theta } & =\left. \int \phi^{\prime \prime }\left(\frac{ F_{\theta^{{\ast} }}(x)}{F_{\theta }(x)}\right) \frac{1}{F_{\theta }(x)} \left(\frac{d}{d\theta^{{\ast} }}F_{\theta^{{\ast} }}(x)\right)^{2}dx\right \vert_{\theta^{{\ast} }=\theta } \\ & \quad +\left. \int \phi^{\prime }\left(\frac{F_{\theta^{{\ast} }}(x)}{ F_{\theta }(x)}\right) \frac{d^{2}}{d(\theta^{{\ast} })^{2}}F_{\theta^{{\ast} }}(x)dx\right \vert_{\theta^{{\ast} }=\theta }, \end{align*}

and taking into account that $\phi ^{\prime }(1)=0,$

(44)\begin{equation} \frac{d^{2}}{d(\theta^{{\ast} })^{2}}w(\theta^{{\ast} },\theta)|_{\theta^{{\ast} }=\theta }=\int \phi^{\prime \prime }\left(1\right) \frac{1}{ F_{\theta }(x)}\left(\frac{d}{d\theta }F_{\theta }(x)\right)^{2}dx=\phi^{\prime \prime }\left(1\right) \int F_{\theta }(x)\left(\frac{d}{d\theta }\ln F_{\theta }(x)\right)^{2}dx. \end{equation}

Based on (42), (43) and (44),

(45)\begin{equation} w(\theta +\delta,\theta)=\int_{ \mathbb{R}^{d}}F_{\theta }(x)\phi \left(\frac{F_{\theta +\delta }(x)}{F_{\theta }(x)} \right) dx=\frac{1}{2}\delta^{2}\phi^{\prime \prime }\left(1\right) \int_{\mathbb{R}^{d}}F_{\theta }(x)\left(\frac{d}{d\theta }\ln F_{\theta }(x)\right)^{2}dx+O(\delta^{3}). \end{equation}

Following exactly the same argument for the function $w^{\ast }(\theta +\delta,\theta )=\phi \left (\int F_{\theta +\delta }dx\text { }/\text { } \int F_{\theta }dx\right )$, we obtain

$$w^{{\ast} }(\theta +\delta,\theta)=\phi \left(\frac{\int F_{\theta +\delta }(x)dx}{\int F_{\theta }(x)dx}\right) =\frac{1}{2}\delta^{2}\phi^{\prime \prime }(1) \left(\frac{d}{d\theta }\ln \int F_{\theta }(x)dx\right)^{2}+O(\delta^{3})$$

and then

(46)\begin{equation} \left(\int F_{\theta }(x)dx\right) \phi \left(\frac{\int F_{\theta +\delta }(x)dx}{\int F_{\theta }(x)dx}\right) =\frac{1}{2}\delta^{2}\phi^{\prime \prime }(1) \left(\int F_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int F_{\theta }(x)dx\right)^{2}+O(\delta^{3}). \end{equation}

Based on (32), (45) and (46),

\begin{align*} \mathcal{CD}_{\phi }(F_{\theta +\delta },F_{\theta }) & =\frac{\phi^{\prime \prime }(1) }{2}\delta^{2}\left \{ \int_{ \mathbb{R}^{d}}F_{\theta }(x)\left(\frac{d}{d\theta }\ln F_{\theta }(x)\right)^{2}dx-\left(\int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right)^{2}\right \} \\ & \quad +O(\delta^{3}), \end{align*}

which leads to the desired result.

Based on the previous Proposition and in complete analogy with (17), which connects the classic Fisher information with Csiszár's $\phi$-divergence, we state the definition of the Fisher's type cumulative and survival information.

Definition 3. For a parametric family of joint distribution functions $F_{\theta }(x)$ with corresponding survival function $\bar {F}_{\theta }(x)$, $x\in \mathbb {R}^{d}$ and $\theta \in \Theta \subseteq \mathbb {R}$, the Fisher's type cumulative and survival measures of information are defined by

(47)\begin{align} \mathcal{CI}_{F}^{Fi}(\theta)=\int_{ \mathbb{R}^{d}}F_{\theta }(x)\left(\frac{d}{d\theta }\ln F_{\theta }(x)\right)^{2}dx-\left(\int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int_{ \mathbb{R}^{d}}F_{\theta }(x)dx\right)^{2}, \end{align}
(48)\begin{align} \mathcal{SI}_{F}^{Fi}(\theta)=\int_{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)\left(\frac{d}{d\theta }\ln \bar{F}_{\theta }(x)\right)^{2}dx-\left(\int_{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)dx\right) \left(\frac{d}{d\theta }\ln \int _{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)dx\right)^{2}, \end{align}

for $\theta \in \Theta \subseteq \mathbb {R}$.

Remark 1.

  1. (a) Observe that the above defined Fisher's type cumulative and survival measures are not analogous of the classic one defined by means of probability density functions, such as the measure $\mathcal {I}_{f}^{Fi}(\theta )$, defined by (13). It was expected because the cumulative and survival $\phi$-divergences, (32) and (33), which are used to define the Fisher's type measures of the above definition, are not analogous of the classic Csiszár's $\phi$-divergence (8) for the reasons provided in the previous subsections. More precisely, because the analogous expressions of classic divergences, which are obtained by replacing densities with cumulative and survival functions may lead to negative quantities, something which was shown in the counter example, of Section 3.

  2. (b) Fisher's type survival measure $\mathcal {SI}_{F}^{Fi}(\theta )$ has an alternative expression, in terms of expected values, if we restrict ourselves to the univariate case $d=1$. Indeed, if we focus again on a non-negative random variable $X$ with survival function $\bar {F}_{\theta }$, then $\mathcal {SI}_{F}^{Fi}(\theta )$ of (48) is formulated as follows:

    (49)\begin{equation} \mathcal{SI}_{F}^{Fi}(\theta)=\int_{0}^{\infty }\bar{F}_{\theta }(x)\left(\frac{d}{d\theta }\ln \bar{F}_{\theta }(x)\right)^{2}dx-\left(E_{\theta }(X)\right) \left(\frac{d}{d\theta }\ln E_{\theta }(X)\right)^{2},\quad \theta \in \Theta \subseteq \mathbb{R}. \end{equation}
  3. (c) Fisher's type cumulative and survival measures of (47) and (48) can be extended to the multiparameter case $\theta \in \Theta \subseteq \mathbb {R}^{m}$. In this case, the extensions of $\mathcal {CI}_{F}^{Fi}(\theta )$ and $\mathcal {SI}_{F}^{Fi}(\theta )$ will be $m\times m$ symmetric matrices, but their exposition is outside the scopes of this paper.

  4. (d) The above defined Fisher's type measures should be non-negative. It is true. The proof of non-negativity of $\mathcal {CI}_{F}^{Fi}$ and $\mathcal {SI }_{F}^{Fi}$, in (47) and (48), is immediately obtained in view of the last proposition. Indeed, $\mathcal {CD}_{\phi }(F_{\theta +\delta },F_{\theta })$ and $\mathcal {SD}_{\phi }(\bar {F}_{\theta +\delta },\bar {F}_{\theta })$ are non-negative, while $\phi ^{\prime \prime }(1) \geq 0$ because $\phi$ is a convex function. Therefore, $\mathcal {CI}_{F}^{Fi}$ and $\mathcal {SI}_{F}^{Fi}$ are non-negative as the limits of non-negative functions.

The Fisher's type cumulative and survival measures of the previous definition have an alternative representation which is formulated in the next proposition. The representation of the proposition is the analogous of the representation (14) of the classic Fisher information measure.

Proposition 4. For a parametric family of joint distribution functions $F_{\theta }(x)$ and survival functions $\bar {F}_{\theta }(x)$, $x\in \mathbb {R}^{d}$, $\theta \in \Theta \subseteq \mathbb {R}$, and under the assumption of interchanging the integral and the derivative sign

(50)\begin{align} \mathcal{CI}_{F}^{Fi}(\theta)& ={-}\int_{ \mathbb{R}^{d}}F_{\theta }(x)\left(\frac{d^{2}}{d\theta^{2}}\ln F_{\theta }(x)\right) dx+\frac{d^{2}}{d\theta^{2}}i(\theta)-i(\theta)\left(\frac{d }{d\theta }\ln i(\theta)\right)^{2}, \end{align}
(51)\begin{align} \mathcal{SI}_{F}^{Fi}(\theta)& ={-}\int_{ \mathbb{R}^{d}}\bar{F}_{\theta }(x)\left(\frac{d^{2}}{d\theta^{2}}\ln \bar{F}_{\theta }(x)\right) dx+\frac{d^{2}}{d\theta^{2}}\bar{i}(\theta)- \bar{i}(\theta)\left(\frac{d}{d\theta }\ln \bar{i}(\theta)\right)^{2}, \end{align}

with $i(\theta )=\int _{ \mathbb {R}^{d}}F_{\theta }(x)dx$ and $\bar {i}(\theta )=\int _{ \mathbb {R} ^{d}}\bar {F}_{\theta }(x)dx,$ for $\theta \in \Theta \subseteq \mathbb {R}$. Moreover, for a non-negative random variable $X$ with survival function $\bar {F}_{\theta }$, $\bar {i}(\theta )=E_{\theta }(X)$ and

(52)\begin{equation} \mathcal{SI}_{F}^{Fi}(\theta)={-}\int_{0}^{\infty }\bar{F}_{\theta }(x)\left(\frac{d^{2}}{d\theta^{2}}\ln \bar{F}_{\theta }(x)\right) dx+\frac{d^{2}}{d\theta^{2}}E_{\theta }(X)-E_{\theta }(X)\left(\frac{d}{d\theta }\ln E_{\theta }(X)\right)^{2}. \end{equation}

Proof. The proof is obtained by standard algebraic manipulations, similar to that in Kharazmi and Balakrishnan [Reference Kharazmi and Balakrishnan43] p. 6307.

An analogous to (15) respesentation of Fisher's type cumulative and survival measures in the case of a location parameter $\theta$ is formulated in the next proposition.

Proposition 5. Let a random variable $X$ with distribution function $F_{\theta }(x)$ and survival function $\bar {F}_{\theta }(x)=1-F_{\theta }(x)$, $x\in \mathbb {R}$, $\theta \in \Theta \subseteq \mathbb {R}$. Suppose, moreover, that the parameter $\theta$ in the considered models is a location parameter. Then, under the assumption of interchanging the integral and the derivative sign

(53)\begin{align} \mathcal{CI}_{F}^{Fi}(X)& =\mathcal{CI}_{F}^{Fi}(F)=\int_{ \mathbb{R} }F(x)\left(\frac{d}{dx}\ln F(x)\right)^{2}dx-i^{{-}1}(F), \end{align}
(54)\begin{align} \mathcal{SI}_{F}^{Fi}(X)& =\mathcal{SI}_{F}^{Fi}(\bar{F})=\int_{ \mathbb{R} }\bar{F}(x)\left(\frac{d}{dx}\ln \bar{F}(x)\right)^{2}dx- \bar{i}^{{-}1}(\bar{F}), \end{align}

where $F$ is a distribution function such that $F_{\theta }(x)=F(x-\theta )$ and $\bar {F}_{\theta }(x)=\bar {F}(x-\theta )=1-F(x-\theta )$, $x\in \mathbb {R}$, $\theta \in \Theta \subseteq \mathbb {R}$, with $i(F)=\int _{ \mathbb {R} }F(x)dx$ and $\bar {i}(\bar {F})=\int _{ \mathbb {R} }\bar {F}(x)dx$.

Proof. Taking into account that $\theta$ is a loc