To send content items to your account,
please confirm that you agree to abide by our usage policies.
If this is the first time you use this feature, you will be asked to authorise Cambridge Core to connect with your account.
Find out more about sending content to .
To send content items to your Kindle, first ensure firstname.lastname@example.org
is added to your Approved Personal Document E-mail List under your Personal Document Settings
on the Manage Your Content and Devices page of your Amazon account. Then enter the ‘name’ part
of your Kindle email address below.
Find out more about sending to your Kindle.
Note you can select to send to either the @free.kindle.com or @kindle.com variations.
‘@free.kindle.com’ emails are free but can only be sent to your device when it is connected to wi-fi.
‘@kindle.com’ emails can be delivered even when you are not connected to wi-fi, but note that service fees apply.
where the constant
cannot be replaced by
. In addition to being interesting and important in their own right, inequalities such as these have applications in additive combinatorics. We show that for
to be extremal for this inequality, we must have
Our central technique for deriving this result is local perturbation of
to increase the value of the autocorrelation, while leaving
unchanged. These perturbation methods can be extended to examine a more general notion of autocorrelation. Let
matrix with real entries and columns
$1\leq i\leq n$
be a constant. For a broad class of matrices
, we prove necessary conditions for
to extremise autocorrelation inequalities of the form
This paper provides a toolbox for the credibility analysis of frequency risks, with allowance for the seniority of claims and of risk exposure. We use Poisson models with dynamic and second-order stationary random effects that ensure nonnegative credibilities per period. We specify classes of autocovariance functions that are compatible with positive random effects and that entail nonnegative credibilities regardless of the risk exposure. Random effects with nonnegative generalized partial autocorrelations are shown to imply nonnegative credibilities. This holds for ARFIMA(0, d, 0) models. The AR(p) time series that ensure nonnegative credibilities are specified from their precision matrices. The compatibility of these semiparametric models with log-Gaussian random effects is verified. Gaussian sequences with ARFIMA(0, d, 0) specifications, which are then exponentiated entrywise, provide positive random effects that also imply nonnegative credibilities. Dynamic random effects applied to Poisson distributions are retained as products of two uncorrelated and positive components: the first is time-invariant, whereas the autocovariance function of the second vanishes at infinity and ensures nonnegative credibilities. The limit credibility is related to the three levels for the length of the memory in the random effects. The limit credibility is less than one in the short memory case, and a formula is provided.
We investigated the efficiency of the autoregressive repeatability model (AR) for genetic evaluation of longitudinal reproductive traits in Portuguese Holstein cattle and compared the results with those from the conventional repeatability model (REP). The data set comprised records taken during the first four calving orders, corresponding to a total of 416, 766, 872 and 766 thousand records for interval between calving to first service, days open, calving interval and daughter pregnancy rate, respectively. Both models included fixed (month and age classes associated to each calving order) and random (herd-year-season, animal and permanent environmental) effects. For AR model, a first-order autoregressive (co)variance structure was fitted for the herd-year-season and permanent environmental effects. The AR outperformed the REP model, with lower Akaike Information Criteria, lower Mean Square Error and Akaike Weights close to unity. Rank correlations between estimated breeding values (EBV) with AR and REP models ranged from 0.95 to 0.97 for all studied reproductive traits, when the total bulls were considered. When considering only the top-100 selected bulls, the rank correlation ranged from 0.72 to 0.88. These results indicate that the re-ranking observed at the top level will provide more opportunities for selecting the best bulls. The EBV reliabilities provided by AR model was larger for all traits, but the magnitudes of the annual genetic progress were similar between two models. Overall, the proposed AR model was suitable for genetic evaluations of longitudinal reproductive traits in dairy cattle, outperforming the REP model.
While the spatial weights matrix
is at the core of spatial regression models, there is a scarcity of techniques for validating a given specification of
. I approach this problem from a measurement error perspective. When
is inflated by a constant, a predictable form of endogeneity occurs that is not problematic in other regression contexts. I use this insight to construct a theoretically appealing test and control for the validity of
that is tractable in panel data, which I call the K test. I demonstrate the utility of the test using Monte Carlo simulations.
Large frugivores provide critical seed dispersal services for many plant species and their extirpation from forested ecosystems can cause compositional shifts in regenerating plant cohorts. Yet, we still poorly understand whether large seed-dispersers have complementary or redundant roles for forest regeneration. Here, to assess the functional complementarity of large-bodied frugivores in forest regeneration, we quantified the effects of varying abundance of hornbills, primates and the forest elephant on the density, species richness and the mean weighted seed length of animal-dispersed tree species among seedlings in five sites in a forest–savanna mosaic in D. R. Congo, while accounting for percentage forest cover and the local presence of fruiting trees. We found that the abundance of primates was positively associated with species richness of seedlings, while percentage forest cover was negatively associated (R2 = 0.19). The abundance of hornbills, the presence of elephants and percentage forest cover were positively associated with mean seed length of the regenerating cohort (R2 = 0.13). Spatially explicit analysis indicated that some additional processes have an important influence on these response indices. Primates would seem to have a preponderant role for maintaining relatively high species richness, while hornbills and elephant would seem to be predominantly responsible for the recruitment of large-seeded trees. Our results could indicate that these taxa of frugivores play complementary functional roles for forest regeneration. This suggests that the extirpation of one or more of these dispersers would likely not be functionally compensated for by the remaining taxa, hence possibly cascading into compositional shifts.
Texas German is a new world language variety that shows some evidence of koiné development but also presents with substantial variation at many levels of structure. I present a case study on the variant pronunciation of sibilants in Texas German consonant clusters. This feature is fairly frequent and found throughout the regions of German settlement in Central Texas. After a discussion of the presence of this feature in the donor dialects, I investigate the factors that correlate with variation in the modern language. From an analysis of local and global spatial autocorrelation, I argue that variation is not significantly associated with particular geographic regions and is compatible with stable and homogenous variation. This provides insight into our understanding of new dialect emergence and the mechanisms by which dialect features are leveled over multiple generations.
Review of correlation and simple linear regression. Introduction to lagged (cross-) correlation for identifying recurrent and periodic features in common between pairs of time-series, statistical evidence of possible causal relationships. Introduction to (lagged) autocorrelation for identifying recurrent and periodic features in time-series. Use of correlation and simple linear regression for statistical comparison of time-series to reference datasets, with focus on periodic (sinusoidal) reference datasets. Interpretation of statistical effect-size and significance (p-value).
Overview of key identifying features of noise as can typically occur in geoscience time-series. Categorisation according to noise colour; white, red and blue noise. Consideration of autocorrelation and autoregression, power spectral density and power-law. Worked red-noise example to illustrate.
Hemorrhagic fever with renal syndrome (HFRS) caused by hantaviruses is a serious public health problem in China, accounting for 90% of HFRS cases reported globally. In this study, we applied geographical information system (GIS), spatial autocorrelation analyses and a seasonal autoregressive-integrated moving average (SARIMA) model to describe and predict HFRS epidemic with the objective of monitoring and forecasting HFRS in mainland China. Chinese HFRS data from 2004 to 2016 were obtained from National Infectious Diseases Reporting System (NIDRS) database and Chinese Centre for Disease Control and Prevention (CDC). GIS maps were produced to detect the spatial distribution of HFRS cases. The Moran's I was adopted in spatial global autocorrelation analysis to identify the integral spatiotemporal pattern of HFRS outbreaks, while the local Moran's Ii was performed to identify ‘hotspot’ regions of HFRS at province level. A fittest SARIMA model was developed to forecast HFRS incidence in the year 2016, which was selected by Akaike information criterion and Ljung–Box test. During 2004–2015, a total of 165 710 HFRS cases were reported with the average annual incidence at province level ranged from 0 to 13.05 per 100 000 persons. Global Moran's I analysis showed that the HFRS outbreaks presented spatially clustered distribution, with the degree of cluster gradually decreasing from 2004 to 2009, then turned out to be randomly distributed and reached lowest point in 2012. Local Moran's Ii identified that four provinces in northeast China contributed to a ‘high–high’ cluster as a traditional epidemic centre, and Shaanxi became another HFRS ‘hotspot’ region since 2011. The monthly incidence of HFRS decreased sharply from 2004 to 2009 in mainland China, then increased markedly from 2010 to 2012, and decreased again since 2013, with obvious seasonal fluctuations. The SARIMA ((0,1,3) × (1,0,1)12) model was the most fittest forecasting model for the dataset of HFRS in mainland China. The spatiotemporal distribution of HFRS in mainland China varied in recent years; together with the SARIMA forecasting model, this study provided several potential decision supportive tools for the control and risk-management plan of HFRS in China.
We need reliable data on the spatial distribution of parasites in order to achieve an inventory of global parasite biodiversity and establish robust conservation initiatives based on regional disease risk. This requires an integrated and spatially consistent effort toward the discovery of new parasite species. Using a large and representative dataset on the geographical coordinates where 4943 helminth species were first discovered, we first test whether the geographical distribution of parasite species reports is spatially congruent across helminth higher taxa; i.e. whether areas, where many trematodes are found, are also areas where many nematodes or cestodes have been discovered. Second, we test whether the global geographical distribution of new helminth species reports has changed significantly over time, i.e. across the last few decades. After accounting for spatial autocorrelation in the data, we find no strong statistical support for either of the patterns we investigated. Overall, our results indicate that helminth species discoveries are both spatially incongruent among higher taxa of helminths, and inconsistent over time. These findings suggest that the global parasite discovery effort is inefficient, spatially biased and subject to idiosyncrasies. Coordinated biodiscovery programmes, involving research teams with expertise in multiple taxonomic groups, seem the best approach to remedy these issues.
The goal of this study was to analyse the spatial pattern of tuberculosis (TB) mortality using different approaches, namely: mortality rates (MR), spatial relative risks (RR) and Bayesian rates (Global and Local) and their association with human development index (HDI), Global and its three dimensions: education, longevity and income. An ecological study was developed in Curitiba, Brazil based on data from Mortality Information System (2008–2014). Spatial scan statistics were used to compute RR and identify high-risk clusters. Bivariate Local Indicator of Spatial Associations was used to assess associations. MR ranged between 0 and 25.24/100.000 with a mean (standard deviation) of 1.07 (2.66). Corresponding values for spatial RR were 0–27.46, 1.2 (2.99) and for Bayesian rates (Global and Local) were 0.49–1.66, 0.90 (0.19) and 0–6.59, 0.98 (0.80). High-risk clusters were identified for all variables, except for HDI-income and Global Bayesian rate. Significant negative spatial relations were found between MR and income; between RR and HDI global, longevity and income; and Bayesian rates with all variables. Some areas presented different patterns: low social development/low risk and high risk/high development. These results demonstrate that social development variables should be considered, in mortality due TB.
Instruments based on realizations of the endogenous variable in other units—for instance, regional or global weighted averages—are commonly used in political science. Such spatial instruments have proved attractive: they are convenient to obtain, typically have power, and are plausibly exogenous. We argue that the assumptions underlying spatial instruments remain poorly understood and challenge whether spatial instruments can satisfy the conditions required for valid instruments. First, when cross-unit dependence exists in the endogenous predictor, other cross-unit relationships—spillovers and interdependence—likely exist as well and risk violations of the exclusion restriction. Second, spatial instruments produce simultaneity in the first-stage equation, as left-hand side outcomes are included as right-hand side predictors. Because the instrument and the endogenous variable are simultaneously determined, the exclusion restriction is, necessarily and by construction, violated. Taken together, these concerns lead us to conclude that spatial instruments are rarely, if ever, valid.
Introduction: Understanding the spatial distribution of opioid abuse at the local level may facilitate community intervention strategies. The purpose of this analysis was to apply spatial analytical methods to determine clustering of opioid-related emergency medical services (EMS) responses in the City of Calgary. Methods: Using opioid-related EMS responses in the City of Calgary between January 1st through October 31st, 2017, we estimated the dissemination area (DA) specific spatial randomness effects by incorporating the spatial autocorrelation using intrinsic Gaussian conditional autoregressive model and generalized linear mixed models (GLMM). Global spatial autocorrelation was evaluated by Morans I index. Both Getis-Ord Gi and the LISA function in Geoda were used to estimate the local spatial autocorrelation. Two models were applied: 1) Poisson regression with DA-specific non-spatial random effects; 2) Poisson regression with DA-specific G-side spatial random effects. A pseudolikelihood approach was used for model comparison. Two types of cluster analysis were used to identify the spatial clustering. Results: There were 1488 opioid-related EMS responses available for analysis. Of the responses, 74% of the individuals were males. The median age was 33 years ( IQR: 26-42 years) with 65% of individuals between 20 and 39 years, and 27% between 40 and 64 years. In 62% of EMS responses, poisoning/overdose was the chief complaint. The global Morans Index implied the presence of global spatial autocorrelation. Comparing the two models applied suggested that the spatial model provided a better fit for the adjusted opioid-related EMS response rate. Calgary Center and East were identified as hot spots by both types of cluster analysis. Conclusion: Spatial modeling has a better predictability to assess potential high risk areas and identify locations for community intervention strategies. The clusters identified in Calgarys Center and East may have implications for future response strategies.
In this paper, we assess whether or not organic agriculture has a positive impact on local economies. We first identify organic agriculture hotspots (clusters of counties with positively correlated high numbers of organic operations) using spatial statistics. Then, we estimate a treatment effects model that classifies a county's membership in an organic hotspot as an endogenous treatment variable. By modeling what a hotspot county's economic indicators would have been had the county not been part of a hotspot, this model captures the effect of being in a hotspot on a county's economic indicators. We perform the same analysis for general agricultural farm hotspots to confirm that the benefits associated with organic production hotspots are, in fact, due to the organic component. Our results show that organic hotspot membership leads to a lower county-level poverty rate and a higher median household income. A similar result is not found when investigating the impact of general agriculture hotspots. On the other hand, our result is robust to alternative hotspot definitions based on type of organic operations to alternative methods of estimating average treatment effects on the treated. These results provide strong motivation for considering hotspots of organic handling operations, which refers to middlemen such as processors, wholesalers and brokers, and hotspots of organic production to be local economic development tools, and may be of interest to policymakers whose objective is to promote rural development. Our results may incentivize policymakers to specifically focus on organic development, rather than the more general development of agriculture, as a means to promote economic growth in rural areas, and may further point them in the direction of not only encouraging the presence of organic operations, but of fostering the development of clusters or hotspots of these operations.
A classical approach in precision agriculture consists in validating within field zones defined from high spatial resolution observations by agronomic information (AI). Zones validation generally involves a two-step process. First, AI are obtained on a regular grid or following a target sampling strategy inside the field. Then, a statistical test, most often an ANOVA, is used to determine if the management zones created with the high spatial resolution auxiliary data explain differences in the AI values. Unfortunately, in precision agriculture, many of the works using such an approach omit a necessary condition for the implementation of the aforementioned ANOVA test, i.e. the observations need to be independent from each other. This condition is unfortunately seldom satisfied since AI are often spatially auto-correlated. In order to highlight this problem, simulated datasets with different and known AI spatial autocorrelation were used. Results show that as AI are more and more spatially auto-correlated, ANOVA tests almost always conclude that the management zones obtained with auxiliary data are significant whatever the zoning, i.e. even a completely random one. To overcome this problem, the paper introduces two methods directly inspired from published works in the field of ecology. Two cases were considered: the first one applies when large AI dataset (n>40) is available and the other one applies for small AI dataset (n<40). Both methods are implemented on a real precision viticulture example.
Advances in agricultural machinery, information and sensor technology have led to an increasing amount of data that is available spatially both pre and within season. The case is compelling for the spatialisation of existing, non-spatial (field-scale) crop models that can accommodate this ‘big data’ and lead to more precise predictions of yield and quality and an improved field management. This study explores the conceptual spatial models based on the potato crop models that simulate crop physical and physiological processes and predict yields and graded yields at a field-scale. Through exploring the possible spatial scales and model application approaches considering spatial variation an optimal and more effective solution is expected. Issues concerning model quality and uncertainty are also discussed.
Essentially all biological processes are highly dependent on the nanoscale architecture of the cellular components where these processes take place. Statistical measures, such as the autocorrelation function (ACF) of the three-dimensional (3D) mass–density distribution, are widely used to characterize cellular nanostructure. However, conventional methods of reconstruction of the deterministic 3D mass–density distribution, from which these statistical measures can be calculated, have been inadequate for thick biological structures, such as whole cells, due to the conflict between the need for nanoscale resolution and its inverse relationship with thickness after conventional tomographic reconstruction. To tackle the problem, we have developed a robust method to calculate the ACF of the 3D mass–density distribution without tomography. Assuming the biological mass distribution is isotropic, our method allows for accurate statistical characterization of the 3D mass–density distribution by ACF with two data sets: a single projection image by scanning transmission electron microscopy and a thickness map by atomic force microscopy. Here we present validation of the ACF reconstruction algorithm, as well as its application to calculate the statistics of the 3D distribution of mass–density in a region containing the nucleus of an entire mammalian cell. This method may provide important insights into architectural changes that accompany cellular processes.
Temporal shaping of time series is the activity of deriving a time series model with a prescribed marginal distribution and some sample path characteristics. Starting with an empirical sample path, one often computes from it an empirical histogram (a step-function density) and empirical autocorrelation function. The corresponding cumulative distribution function is piecewise linear, and so is the inverse distribution function. The so-called inversion method uses the latter to generate the corresponding distribution from a uniform random variable on [0,1), histograms being a special case. This paper shows how to manipulate the inverse histogram and an underlying marginally uniform process, so as to “shape” the model sample paths in an attempt to match the qualitative nature of the empirical sample paths, while maintaining a guaranteed match of the empirical marginal distribution. It proposes a new approach to temporal shaping of time series and identifies a number of operations on a piecewise-linear inverse histogram function, which leave the marginal distribution invariant. For cyclical processes with a prescribed marginal distribution and a prescribed cycle profile, one can also use these transformations to generate sample paths which “conform” to the profile. This approach also improves the ability to approximate the empirical autocorrelation function.
We construct random fields with Pólya-type autocorrelation function and dampened Pólya cross-correlation function. The marginal distribution of the random fields may be taken as any infinitely divisible distribution with finite variance, and the random fields are fully characterized in terms of their joint characteristic function. This makes available a new class of non-Gaussian random fields with flexible correlation structure for use in modeling and estimation.
Lifecourse trajectories of clinical or anthropological attributes are useful for identifying how our early-life experiences influence later-life morbidity and mortality. Researchers often use growth mixture models (GMMs) to estimate such phenomena. It is common to place constrains on the random part of the GMM to improve parsimony or to aid convergence, but this can lead to an autoregressive structure that distorts the nature of the mixtures and subsequent model interpretation. This is especially true if changes in the outcome within individuals are gradual compared with the magnitude of differences between individuals. This is not widely appreciated, nor is its impact well understood. Using repeat measures of body mass index (BMI) for 1528 US adolescents, we estimated GMMs that required variance–covariance constraints to attain convergence. We contrasted constrained models with and without an autocorrelation structure to assess the impact this had on the ideal number of latent classes, their size and composition. We also contrasted model options using simulations. When the GMM variance–covariance structure was constrained, a within-class autocorrelation structure emerged. When not modelled explicitly, this led to poorer model fit and models that differed substantially in the ideal number of latent classes, as well as class size and composition. Failure to carefully consider the random structure of data within a GMM framework may lead to erroneous model inferences, especially for outcomes with greater within-person than between-person homogeneity, such as BMI. It is crucial to reflect on the underlying data generation processes when building such models.