Book contents
- Frontmatter
- Contents
- List of contributors
- Preface
- Frequently used notations and symbols
- 1 Algebraic and geometric methods in statistics
- Part I Contingency tables
- Part II Designed experiments
- Part III Information geometry
- 14 Introduction to non-parametric estimation
- 15 The Banach manifold of quantum states
- 16 On quantum information manifolds
- 17 Axiomatic geometries for text documents
- 18 Exponential manifold by reproducing kernel Hilbert spaces
- 19 Geometry of extended exponential models
- 20 Quantum statistics and measures of quantum information
- Part IV Information geometry and algebraic statistics
- Part V On-line supplements
17 - Axiomatic geometries for text documents
from Part III - Information geometry
Published online by Cambridge University Press: 27 May 2010
- Frontmatter
- Contents
- List of contributors
- Preface
- Frequently used notations and symbols
- 1 Algebraic and geometric methods in statistics
- Part I Contingency tables
- Part II Designed experiments
- Part III Information geometry
- 14 Introduction to non-parametric estimation
- 15 The Banach manifold of quantum states
- 16 On quantum information manifolds
- 17 Axiomatic geometries for text documents
- 18 Exponential manifold by reproducing kernel Hilbert spaces
- 19 Geometry of extended exponential models
- 20 Quantum statistics and measures of quantum information
- Part IV Information geometry and algebraic statistics
- Part V On-line supplements
Summary
Abstract
High-dimensional structured data such as text and images is often poorly understood and misrepresented in statistical modelling. Typical approaches to modelling such data involve, either explicitly or implicitly, arbitrary geometric assumptions. In this chapter, we consider statistical modelling of non-Euclidean data whose geometry is obtained by embedding the data in a statistical manifold. The resulting models perform better than their Euclidean counterparts on real world data and draw an interesting connection between Caronencov and Campbell's axiomatic characterisation of the Fisher information and the recently proposed diffusion kernels and square root embedding.
Introduction
Geometry is ubiquitous in many aspects of statistical modelling. During the last half century a geometrical theory of statistical inference has been constructed by Rao, Efron, Amari, and others. This theory, commonly referred to as information geometry, describes many aspects of statistical modelling through the use of Riemannian geometric notions such as distance, curvature and connections (Amari and Nagaoka 2000). Information geometry has been mostly involved with the geometric interpretations of asymptotic inference. Focusing on the geometry of parametric statistical families ρ = {ρθ : θ ∈ θ Θ}, information geometry has had relatively little influence on the geometrical analysis of data. In particular, it has largely ignored the role of the geometry of the data space X in statistical inference and algorithmic data analysis.
On the other hand, the recent growth in computing resources and data availability has lead to widespread analysis and modelling of structured data such as text and images. Such data does not naturally lie in ℝn and the Euclidean distance and its corresponding geometry do not describe it well.
- Type
- Chapter
- Information
- Algebraic and Geometric Methods in Statistics , pp. 277 - 290Publisher: Cambridge University PressPrint publication year: 2009