Approximate Bayesian inference for a spatial point process model exhibiting regularity and random aggregation

Scandinavian Journal of Statistics

Published On 2022/3

In this article, we propose a doubly stochastic spatial point process model with both aggregation and repulsion. This model combines the ideas behind Strauss processes and log Gaussian Cox processes. The likelihood for this model is not expressible in closed form but it is easy to simulate realizations under the model. We therefore explain how to use approximate Bayesian computation (ABC) to carry out statistical inference for this model. We suggest a method for model validation based on posterior predictions and global envelopes. We illustrate the ABC procedure and model validation approach using both simulated point patterns and a real data example.

Journal

Scandinavian Journal of Statistics

Published On

2022/3

Volume

49

Issue

1

Page

185-210

Authors

alan e gelfand

alan e gelfand

Duke University

Position

Professor of Statistical Science

H-Index(all)

86

H-Index(since 2020)

47

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

spatial statistics

environmental processes

hierarchical models

University Profile Page

Jesper Møller

Jesper Møller

Aalborg Universitet

Position

Professor in Statistics

H-Index(all)

46

H-Index(since 2020)

23

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Mathematical Statistics

Probability Theory

University Profile Page

Other Articles from authors

alan e gelfand

alan e gelfand

Duke University

TEST

Bayesian joint quantile autoregression

Quantile regression continues to increase in usage, providing a useful alternative to customary mean regression. Primary implementation takes the form of so-called multiple quantile regression, creating a separate regression for each quantile of interest. However, recently, advances have been made in joint quantile regression, supplying a quantile function which avoids crossing of the regression across quantiles. Here, we turn to quantile autoregression (QAR), offering a fully Bayesian version. We extend the initial quantile regression work of Koenker and Xiao (J Am Stat Assoc 101(475):980–990, 2006. https://doi.org/10.1198/016214506000000672) in the spirit of Tokdar and Kadane (Bayesian Anal 7(1):51–72, 2012. https://doi.org/10.1214/12-BA702). We offer a directly interpretable parametric model specification for QAR. Further, we offer a pth-order QAR(p) version, a multivariate QAR(1) version, and a spatial …

alan e gelfand

alan e gelfand

Duke University

arXiv preprint arXiv:2403.00080

Spatio-temporal modeling for record-breaking temperature events in Spain

Record-breaking temperature events are now very frequently in the news, viewed as evidence of climate change. With this as motivation, we undertake the first substantial spatial modeling investigation of temperature record-breaking across years for any given day within the year. We work with a dataset consisting of over sixty years (1960-2021) of daily maximum temperatures across peninsular Spain. Formal statistical analysis of record-breaking events is an area that has received attention primarily within the probability community, dominated by results for the stationary record-breaking setting with some additional work addressing trends. Such effort is inadequate for analyzing actual record-breaking data. Effective analysis requires rich modeling of the indicator events which define record-breaking sequences. Resulting from novel and detailed exploratory data analysis, we propose hierarchical conditional models for the indicator events. After suitable model selection, we discover explicit trend behavior, necessary autoregression, significance of distance to the coast, useful interactions, helpful spatial random effects, and very strong daily random effects. Illustratively, the model estimates that global warming trends have increased the number of records expected in the past decade almost two-fold, 1.93 (1.89,1.98), but also estimates highly differentiated climate warming rates in space and by season.

alan e gelfand

alan e gelfand

Duke University

Methods in Ecology and Evolution

Generative spatial generalized dissimilarity mixed modelling (spGDMM): An enhanced approach to modelling beta diversity

Turnover, or change in the composition of species over space and time, is one of the primary ways to define beta diversity. Inferring what factors impact beta diversity is not only important for understanding biodiversity processes but also for conservation planning. At present, a popular approach to understanding the drivers of compositional turnover is through generalized dissimilarity modelling (GDM). We argue that the current GDM approach suffers several limitations and provide an alternative modelling approach that remedies these issues. We propose using generative spatial random effects models implemented in a Bayesian framework. We offer hierarchical specifications to yield full regression and spatial predictive inference, both with associated full uncertainties. The approach is illustrated by examining dissimilarity in three datasets: tree survey data from Panama's Barro Colorado Island (BCI), plant …

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2404.09525

Coupling results and Markovian structures for number representations of continuous random variables

A general setting for nested subdivisions of a bounded real set into intervals defining the digits of a random variable with a probability density function is considered. Under the weak condition that is almost everywhere lower semi-continuous, a coupling between and a non-negative integer-valued random variable is established so that have an interpretation as the ``sufficient digits'', since the distribution of conditioned on does not depend on . Adding a condition about a Markovian structure of the lengths of the intervals in the nested subdivisions, becomes a Markov chain of a certain order . If then are IID with a known distribution. When and the Markov chain is uniformly geometric ergodic, a coupling is established between and a random time so that the chain after time is stationary and follows a simple known distribution. The results are related to several examples of number representations generated by a dynamical system, including base- expansions, generalized L\"uroth series, -expansions, and continued fraction representations. The importance of the results and some suggestions and open problems for future research are discussed.

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2404.08387

The asymptotic distribution of the scaled remainder for pseudo golden ratio expansions of a continuous random variable

Let be the base- expansion of a continuous random variable on the unit interval where is the positive solution to for an integer (i.e., is a generalization of the golden mean for which ). We study the asymptotic distribution and convergence rate of the scaled remainder when tends to infinity.

Jesper Møller

Jesper Møller

Aalborg Universitet

Methodology and Computing in Applied Probability

How many digits are needed?

Let be the digits in the base-q expansion of a random variable X defined on [0, 1) where is an integer. For , we study the probability distribution of the (scaled) remainder : If X has an absolutely continuous CDF then converges in the total variation metric to the Lebesgue measure on the unit interval. Under weak smoothness conditions we establish first a coupling between X and a non-negative integer valued random variable N so that follows and is independent of , and second exponentially fast convergence of and its PDF . We discuss how many digits are needed and show examples of our results.

alan e gelfand

alan e gelfand

Duke University

arXiv preprint arXiv:2404.12583

Analyzing whale calling through Hawkes process modeling

Sound is assumed to be the primary modality of communication among marine mammal species. Analyzing acoustic recordings helps to understand the function of the acoustic signals as well as the possible impact of anthropogenic noise on acoustic behavior. Motivated by a dataset from a network of hydrophones in Cape Cod Bay, Massachusetts, utilizing automatically detected calls in recordings, we study the communication process of the endangered North Atlantic right whale. For right whales an "up-call" is known as a contact call, and ensuing counter-calling between individuals is presumed to facilitate group cohesion. We present novel spatiotemporal excitement modeling consisting of a background process and a counter-call process. The background process intensity incorporates the influences of diel patterns and ambient noise on occurrence. The counter-call intensity captures potential excitement, that calling elicits calling behavior. Call incidence is found to be clustered in space and time; a call seems to excite more calls nearer to it in time and space. We find evidence that whales make more calls during twilight hours, respond to other whales nearby, and are likely to remain quiet in the presence of increased ambient noise.

alan e gelfand

alan e gelfand

Duke University

Journal of Agricultural, Biological and Environmental Statistics

Modeling community dynamics through environmental effects, species interactions and movement

Understanding how communities respond to environmental change is frustrated by the fact that both species interactions and movement affect biodiversity in unseen ways. To evaluate the contributions of species interactions on community growth, dynamic models that can capture nonlinear responses to the environment and the redistribution of species across a spatial range are required. We develop a time-series framework that models the effects of environment–species interactions as well as species–species interactions on population growth within a community. Novel aspects of our model include allowing for species redistribution across a spatial region, and addressing the issue of zero inflation. We adopt a hierarchical Bayesian approach, enabling probabilistic uncertainty quantification in the model parameters. To evaluate the impacts of interactions and movement on population growth, we apply our model …

alan e gelfand

alan e gelfand

Duke University

Global Change Biology

Mechanistic modeling of climate effects on redistribution and population growth in a community of fish species

Understanding community responses to climate is critical for anticipating the future impacts of global change. However, despite increased research efforts in this field, models that explicitly include important biological mechanisms are lacking. Quantifying the potential impacts of climate change on species is complicated by the fact that the effects of climate variation may manifest at several points in the biological process. To this end, we extend a dynamic mechanistic model that combines population dynamics, such as species interactions, with species redistribution by allowing climate to affect both processes. We examine their relative contributions in an application to the changing biomass of a community of eight species in the Gulf of Maine using over 30 years of fisheries data from the Northeast Fishery Science Center. Our model suggests that the mechanisms driving biomass trends vary across space, time, and …

alan e gelfand

alan e gelfand

Duke University

Journal of Agricultural, Biological and Environmental Statistics

Zero-inflated Beta distribution regression modeling

A frequent challenge encountered with ecological data is how to interpret, analyze, or model data having a high proportion of zeros. Much attention has been given to zero-inflated count data, whereas models for non-negative continuous data with an abundance of 0s are much fewer. We consider zero-inflated data on the unit interval and provide modeling to capture two types of 0s in the context of a Beta regression model. We model 0s due to missing by chance through left-censoring of a latent regression and 0s due to unsuitability using an independent Bernoulli specification. We extend the model by introducing spatial random effects. We specify models hierarchically, employing latent variables, and fit them within a Bayesian framework. Our motivating dataset consists of percent cover abundance of two plant families at a collection of sites in the Cape Floristic Region of South Africa. We find that environmental …

alan e gelfand

alan e gelfand

Duke University

arXiv preprint arXiv:2310.08397

Assessing Marine Mammal Abundance: A Novel Data Fusion

Marine mammals are increasingly vulnerable to human disturbance and climate change. Their diving behavior leads to limited visual access during data collection, making studying the abundance and distribution of marine mammals challenging. In theory, using data from more than one observation modality should lead to better informed predictions of abundance and distribution. With focus on North Atlantic right whales, we consider the fusion of two data sources to inform about their abundance and distribution. The first source is aerial distance sampling which provides the spatial locations of whales detected in the region. The second source is passive acoustic monitoring (PAM), returning calls received at hydrophones placed on the ocean floor. Due to limited time on the surface and detection limitations arising from sampling effort, aerial distance sampling only provides a partial realization of locations. With PAM, we never observe numbers or locations of individuals. To address these challenges, we develop a novel thinned point pattern data fusion. Our approach leads to improved inference regarding abundance and distribution of North Atlantic right whales throughout Cape Cod Bay, Massachusetts in the US. We demonstrate performance gains of our approach compared to that from a single source through both simulation and real data.

2023/10/12

Article Details
alan e gelfand

alan e gelfand

Duke University

Package ‘hSDM’

Description User-friendly and fast set of functions for estimating parameters of hierarchical Bayesian species distribution models (Latimer and others 2006< doi: 10.1890/04-0609>). Such models allow interpreting the observations (occurrence and abundance of a species) as a result of several hierarchical processes including ecological processes (habitat suitability, spatial dependence and anthropogenic disturbance) and observation processes (species detectability). Hierarchical species distribution models are essential for accurately characterizing the environmental response of species, predicting their probability of occurrence, and assessing uncertainty in the model results.

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2312.09652

The asymptotic distribution of the remainder in a certain base- expansion

Let be the base- expansion of a continuous random variable on the unit interval where is the golden ratio. We study the asymptotic distribution and convergence rate of the scaled remainder when tends to infinity.

2023/12/15

Article Details
alan e gelfand

alan e gelfand

Duke University

Environmental and Ecological Statistics

Joint multivariate and functional modeling for plant traits and reflectances

The investigation of leaf-level traits in response to varying environmental conditions has immense importance for understanding plant ecology. Remote sensing technology enables measurement of the reflectance of plants to make inferences about underlying traits along environmental gradients. While much focus has been placed on understanding how reflectance and traits are related at the leaf-level, the challenge of modelling the dependence of this relationship while accounting for environmental gradients has limited this line of inquiry. Here, we take up the problem of jointly modeling traits and reflectance given environment. Our objective is to assess not only response to environmental regressors but also dependence between trait levels and the reflectance spectrum in the context of this regression. We jointly model the response vector of traits with reflectance, which is a function of wavelength. To conduct this …

Jesper Møller

Jesper Møller

Aalborg Universitet

Proceedings of the London Mathematical Society

Realizability and tameness of fusion systems

A saturated fusion system over a finite p$p$‐group S$S$ is a category whose objects are the subgroups of S$S$ and whose morphisms are injective homomorphisms between the subgroups satisfying certain axioms. A fusion system over S$S$ is realized by a finite group G$G$ if S$S$ is a Sylow p$p$‐subgroup of G$G$ and morphisms in the category are those induced by conjugation in G$G$. One recurrent question in this subject is to find criteria as to whether a given saturated fusion system is realizable or not. One main result in this paper is that a saturated fusion system is realizable if all of its components (in the sense of Aschbacher) are realizable. Another result is that all realizable fusion systems are tame: a finer condition on realizable fusion systems that involves describing automorphisms of a fusion system in terms of those of some group that realizes it. Stated in this way, these results depend on the …

Jesper Møller

Jesper Møller

Aalborg Universitet

ACM Transactions on Spatial Algorithms and Systems

Stochastic Routing with Arrival Windows

Arriving at a destination within a specific time window is important in many transportation settings. For example, trucks may be penalized for early or late arrivals at compact terminals, and early and late arrivals at general practitioners, dentists, and so on, are also discouraged, in part due to COVID. We propose foundations for routing with arrival-window constraints. In a setting where the travel time of a road segment is modeled by a probability distribution, we define two problems where the aim is to find a route from a source to a destination that optimizes or yields a high probability of arriving within a time window while departing as late as possible. In this setting, a core challenge is to enable comparison between paths that may potentially be part of a result path with the goal of determining whether a path is uninteresting and can be disregarded given the existence of another path. We show that existing solutions …

2023/11/21

Article Details
alan e gelfand

alan e gelfand

Duke University

International Journal of Climatology

Assessing space and time changes in daily maximum temperature in the Ebro basin (Spain) using model‐based statistical tools

There is continuing interest in the investigation of change in temperature over space and time. For this analysis, we offer statistical tools to illuminate changes temporally, at desired temporal resolution, and spatially, using data generated from suitable space–time models. The proposed tools can be used with the output from any suitable model fitted to any set of spatially referenced time series data. The tools to assess space and time changes include spatial surfaces of probabilities and spatial extents for events defined by exceeding a threshold. The spatial surfaces capture the spatial variation in the probability or risk of an exceedance event, while the spatial extents capture the expected proportion of incidence of an event for a region of interest. This approach is used analyse the changes in daily maximum temperature in an inland Mediterranean region (NE of Spain) in the period 1956–2015. The area is very …

2023/12/30

Article Details
Jesper Møller

Jesper Møller

Aalborg Universitet

Spatial Statistics

Fitting the grain orientation distribution of a polycrystalline material conditioned on a Laguerre tessellation

The description of distributions related to grain microstructure helps physicists to understand the processes in materials and their properties. This paper presents a general statistical methodology for the analysis of crystallographic orientations of grains in a 3D Laguerre tessellation dataset which represents the microstructure of a polycrystalline material. We introduce complex stochastic models which may substitute expensive laboratory experiments: conditional on the Laguerre tessellation, we suggest interaction models for the distribution of cubic crystal lattice orientations, where the interaction is between pairs of orientations for neighbouring grains in the tessellation. We discuss parameter estimation and model comparison methods based on maximum pseudolikelihood as well as graphical procedures for model checking using simulations. Our methodology is applied for analysing a dataset representing a nickel …

Jesper Møller

Jesper Møller

Aalborg Universitet

Methodology and Computing in Applied Probability

Singular distribution functions for random variables with stationary digits

Let F be the cumulative distribution function (CDF) of the base-q expansion , where is an integer and is a stationary stochastic process with state space . In a previous paper we characterized the absolutely continuous and the discrete components of F. In this paper we study special cases of models, including stationary Markov chains of any order and stationary renewal point processes, where we establish a law of pure types: F is then either a uniform or a singular CDF on [0, 1]. Moreover, we study mixtures of such models. In most cases expressions and plots of F are given.

alan e gelfand

alan e gelfand

Duke University

The Annals of Applied Statistics

Time-discretization approximation enriches continuous-time discrete-space models for animal movement

Code used to analyze data and generate tables and figures is provided for reproducibility in the supplementary file ctds_methods.zip.

Other articles from Scandinavian Journal of Statistics journal

Pamela A. Shaw

Pamela A. Shaw

University of Pennsylvania

Scandinavian Journal of Statistics

Testing the missing at random assumption in generalized linear models in the presence of instrumental variables

Practical problems with missing data are common, and many methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. In this paper, we present a new hypothesis testing approach for deciding between the conventional notions of missing at random and missing not at random in generalized linear models in the presence of instrumental variables. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our testing approach achieves an objective data‐oriented choice between missing at random or not. We demonstrate the …

Siuli Mukhopadhyay

Siuli Mukhopadhyay

Indian Institute of Technology Bombay

Scandinavian Journal of Statistics

G‐optimal grid designs for kriging models

This work is focused on finding G‐optimal designs theoretically for kriging models with two‐dimensional inputs and separable exponential covariance structures. For design comparison, the notion of evenness of two‐dimensional grid designs is developed. The mathematical relationship between the design and the supremum of the mean squared prediction error (SMSPE) function is studied and then optimal designs are explored for both prospective and retrospective design scenarios. In the case of prospective designs, the new design is developed before the experiment is conducted and the regularly spaced grid is shown to be the G‐optimal design. Retrospective designs are constructed by adding or deleting points from an already existing design. Deterministic algorithms are developed to find the best possible retrospective designs (which minimizes the SMSPE). It is found that a more evenly spread design …

Mika Meitz

Mika Meitz

Helsingin yliopisto

Scandinavian Journal of Statistics

Statistical inference for generative adversarial networks and other minimax problems

This paper studies generative adversarial networks (GANs) from the perspective of statistical inference. A GAN is a popular machine learning method in which the parameters of two neural networks, a generator and a discriminator, are estimated to solve a particular minimax problem. This minimax problem typically has a multitude of solutions and the focus of this paper are the statistical properties of these solutions. We address two key statistical issues for the generator and discriminator network parameters, consistent estimation and confidence sets. We first show that the set of solutions to the sample GAN problem is a (Hausdorff) consistent estimator of the set of solutions to the corresponding population GAN problem. We then devise a computationally intensive procedure to form confidence sets and show that these sets contain the population GAN solutions with the desired coverage probability. Small numerical …

Sarah Friedrich

Sarah Friedrich

Georg-August-Universität Göttingen

Scandinavian Journal of Statistics

Asymptotic properties of resampling‐based processes for the average treatment effect in observational studies with competing risks

In observational studies with time‐to‐event outcomes, the g‐formula can be used to estimate a treatment effect in the presence of confounding factors. However, the asymptotic distribution of the corresponding stochastic process is complicated and thus not suitable for deriving confidence intervals or time‐simultaneous confidence bands for the average treatment effect. A common remedy are resampling‐based approximations, with Efron's nonparametric bootstrap being the standard tool in practice. We investigate the large sample properties of three different resampling approaches and prove their asymptotic validity in a setting with time‐to‐event data subject to competing risks. The usage of these approaches is demonstrated by an analysis of the effect of physical activity on the risk of knee replacement among patients with advanced knee osteoarthritis.

Cheng Yong Tang (汤琤咏)

Cheng Yong Tang (汤琤咏)

Temple University

Scandinavian Journal of Statistics

Testing the missing at random assumption in generalized linear models in the presence of instrumental variables

Practical problems with missing data are common, and many methods have been developed concerning the validity and/or efficiency of statistical procedures. On a central focus, there have been longstanding interests on the mechanism governing data missingness, and correctly deciding the appropriate mechanism is crucially relevant for conducting proper practical investigations. In this paper, we present a new hypothesis testing approach for deciding between the conventional notions of missing at random and missing not at random in generalized linear models in the presence of instrumental variables. The foundational idea is to develop appropriate discrepancy measures between estimators whose properties significantly differ only when missing at random does not hold. We show that our testing approach achieves an objective data‐oriented choice between missing at random or not. We demonstrate the …

Florencia Leonardi

Florencia Leonardi

Universidade de São Paulo

Scandinavian Journal of Statistics

Structure recovery for partially observed discrete Markov random fields on graphs under not necessarily positive distributions

We propose a penalized conditional likelihood criterion to estimate the basic neighborhood of each node in a discrete Markov random field that can be partially observed. We prove the convergence of the estimator in the case of a finite or countable infinite set of nodes. The estimated neighborhoods can be combined to estimate the underlying graph. In the finite case, the graph can be recovered with probability one. In contrast, we can recover any finite subgraph with probability one in the countable infinite case by allowing the candidate neighborhoods to grow as a function o(logn)$$ o\left(\log n\right) $$, with n$$ n $$ the sample size. Our method requires minimal assumptions on the probability distribution, and contrary to other approaches in the literature, the usual positivity condition is not needed. We evaluate the estimator's performance on simulated data and apply the methodology to a real dataset of stock …

Weixing Song

Weixing Song

Kansas State University

Scandinavian Journal of Statistics

Extrapolation estimation for nonparametric regression with measurement error

For the nonparametric regression models with covariates contaminated with the normal measurement errors, this paper proposes an extrapolation algorithm to estimate the regression functions. By applying the conditional expectation directly to the kernel‐weighted least squares of the deviations between the local linear approximation and the observed responses, the proposed algorithm successfully bypasses the simulation step in the classical simulation extrapolation, thus significantly reducing the computational time. It is noted that the proposed method also provides an exact form of the extrapolation function, although the extrapolation estimate generally cannot be obtained by simply setting the extrapolation variable to negative one in the fitted extrapolation function, if the bandwidth is less than the SD of the measurement error. Large sample properties of the proposed estimation procedure are discussed, as …

Reinaldo Boris Arellano Valle

Reinaldo Boris Arellano Valle

Pontificia Universidad Católica de Chile

Scandinavian Journal of Statistics

Corrigendum to “Shannon Entropy and Mutual Information for Multivariate Skew‐Elliptical Distributions” published in Scandinavian Journal of Statistics (2013), vol. 40, pp. 42–62

We thank Florian Stijven, Ariel Alonso Abad, and Gökçe Deliorman for pointing out typos in the signs of the formula for the mutual information index for the Student's t case (p. 47, last formula of Section 2.4). The correct formula is:I XY T n+ m (Ω, ν)= I XY N n+ m (Ω)+ log Γ (ν/2) Γ {(ν+ n+ m)/2} Γ {(ν+ n)/2} Γ {(ν+ m)/2}+ ν+ m 2 ψ ν+ m 2+ ν+ n 2 ψ ν+ n 2− ν+ n+ m 2 ψ ν+ n+ m 2− ν 2 ψ ν 2. $${\displaystyle\begin {array}{ll}\hfill {I} _ {\mathbf {XY}}^{T_ {n+ m}}\left (\boldsymbol {\Omega},\nu\right) &={I} _ {\mathbf {XY}}^{N_ {n+ m}}\left (\boldsymbol {\Omega}\right)+\log\left [\frac {\Gamma\left (\nu/2\right)\Gamma\left\{\left (\nu+ n+ m\right)/2\right\}}{\Gamma\left\{\left (\nu+ n\right)/2\right\}\Gamma\left\{\left (\nu+ m\right)/2\right\}}\right]+\frac {\nu+ m}{2}\psi\left (\frac {\nu+ m}{2}\right)\\{}\hfill &\kern1em+\frac {\nu+ n}{2}\psi\left (\frac {\nu+ n}{2}\right)-\frac {\nu+ n+ m}{2}\psi\left (\frac {\nu+ n+ m}{2}\right)-\frac {\nu}{2}\psi\left (\frac {\nu}{2 …

Takumi Saegusa

Takumi Saegusa

University of Maryland, Baltimore

Scandinavian Journal of Statistics

Confidence bands for survival curves from outcome‐dependent stratified samples

We consider the construction of confidence bands for survival curves under the outcome‐dependent stratified sampling. A main challenge of this design is that data are a biased dependent sample due to stratification and sampling without replacement. Most literature on regression approximates this design by Bernoulli sampling but variance is generally overestimated. Even with this approximation, the limiting distribution of the inverse probability weighted Kaplan–Meier estimator involves a general Gaussian process, and hence quantiles of its supremum is not analytically available. In this paper, we provide a rigorous asymptotic theory for the weighted Kaplan–Meier estimator accounting for dependence in the sample. We propose the novel hybrid method to both simulate and bootstrap parts of the limiting process to compute confidence bands with asymptotically correct coverage probability. Simulation study …

Jonathan Keith

Jonathan Keith

Monash University

Scandinavian Journal of Statistics

G‐optimal grid designs for kriging models

This work is focused on finding G‐optimal designs theoretically for kriging models with two‐dimensional inputs and separable exponential covariance structures. For design comparison, the notion of evenness of two‐dimensional grid designs is developed. The mathematical relationship between the design and the supremum of the mean squared prediction error (SMSPE) function is studied and then optimal designs are explored for both prospective and retrospective design scenarios. In the case of prospective designs, the new design is developed before the experiment is conducted and the regularly spaced grid is shown to be the G‐optimal design. Retrospective designs are constructed by adding or deleting points from an already existing design. Deterministic algorithms are developed to find the best possible retrospective designs (which minimizes the SMSPE). It is found that a more evenly spread design …

Martin Bladt

Martin Bladt

Université de Lausanne

Scandinavian Journal of Statistics

Estimating absorption time distributions of general Markov jump processes

The estimation of absorption time distributions of Markov jump processes is an important task in various branches of statistics and applied probability. While the time‐homogeneous case is classic, the time‐inhomogeneous case has recently received increased attention due to its added flexibility and advances in computational power. However, commuting sub‐intensity matrices are assumed, which in various cases limits the parsimonious properties of the resulting representation. This paper develops the theory required to solve the general case through maximum likelihood estimation, and in particular, using the expectation‐maximization algorithm. A reduction to a piecewise constant intensity matrix function is proposed in order to provide succinct representations, where a parametric linear model binds the intensities together. Practical aspects are discussed and illustrated through the estimation of notoriously …

Nadja Klein

Nadja Klein

Humboldt-Universität zu Berlin

Scandinavian Journal of Statistics

Flexible specification testing in quantile regression models

We propose three novel consistent specification tests for quantile regression models which generalize former tests in three ways. First, we allow the covariate effects to be quantile‐dependent and nonlinear. Second, we allow parameterizing the conditional quantile functions by appropriate basis functions, rather than parametrically. We are thereby able to test for general functional forms, while retaining linear effects as special cases. In both cases, the induced class of conditional distribution functions is tested with a Cramér–von Mises type test statistic for which we derive the theoretical limit distribution and propose a bootstrap method. Third, a modified test statistic is derived to increase the power of the tests. We highlight the merits of our tests in a detailed MC study and two real data examples. Our first application to conditional income distributions in Germany indicates that there are not only still significant …

David Rossell

David Rossell

Universidad Pompeu Fabra

Scandinavian Journal of Statistics

Partial correlation graphical lasso

Standard likelihood penalties to learn Gaussian graphical models are based on regularizing the off‐diagonal entries of the precision matrix. Such methods, and their Bayesian counterparts, are not invariant to scalar multiplication of the variables, unless one standardizes the observed data to unit sample variances. We show that such standardization can have a strong effect on inference and introduce a new family of penalties based on partial correlations. We show that the latter, as well as the maximum likelihood, L0$$ {L}_0 $$ and logarithmic penalties are scale invariant. We illustrate the use of one such penalty, the partial correlation graphical LASSO, which sets an L1$$ {L}_1 $$ penalty on partial correlations. The associated optimization problem is no longer convex, but is conditionally convex. We show via simulated examples and in two real datasets that, besides being scale invariant, there can be important …

Amanda Fernández-Fontelo

Amanda Fernández-Fontelo

Humboldt-Universität zu Berlin

Scandinavian Journal of Statistics

Some mechanisms leading to underdispersion: Old and new proposals

In statistical modeling, it is important to know the mechanisms that cause underdispersion. Several mechanisms that lead to underdispersed count distributions are revisited from new perspectives, and new ones are introduced. These include procedures based on the number of arrivals in arrival processes, such as renewal and pure birth processes and steady‐state distributions of birth‐death processes, like queues with state‐dependent service rates. Weighted Poisson and other well‐known underdispersed distributions are also related to birth‐death processes. Classical and variable binomial thinning mechanisms are also viewed as important procedures for generating underdispersed distributions, which can also generate bivariate count distributions with negative correlation. Some example applications are shown, one of which is related to Biodosimetry.

Michael G. Levin

Michael G. Levin

University of Pennsylvania

Scandinavian Journal of Statistics

A nested semiparametric method for case‐control study with missingness

We propose a nested semiparametric model to analyze a case‐control study where genuine case status is missing for some individuals. The concept of a noncase is introduced to allow for the imputation of the missing genuine cases. The odds ratio parameter of the genuine cases compared to controls is of interest. The imputation procedure predicts the probability of being a genuine case compared to a noncase semiparametrically in a dimension reduction fashion. This procedure is flexible, and vastly generalizes the existing methods. We establish the root‐n$$ n $$ asymptotic normality of the odds ratio parameter estimator. Our method yields stable odds ratio parameter estimation owing to the application of an efficient semiparametric sufficient dimension reduction estimator. We conduct finite sample numerical simulations to illustrate the performance of our approach, and apply it to a dilated cardiomyopathy study.

Eni Musta

Eni Musta

Universiteit van Amsterdam

Scandinavian Journal of Statistics

A two‐step estimation procedure for semiparametric mixture cure models

In survival analysis, cure models have been developed to account for the presence of cured subjects that will never experience the event of interest. Mixture cure models with a parametric model for the incidence and a semiparametric model for the survival of the susceptibles are particularly common in practice. Because of the latent cure status, maximum likelihood estimation is performed via the iterative EM algorithm. Here, we focus on the cure probabilities and propose a two‐step procedure to improve upon the maximum likelihood estimator when the sample size is not large. The new method is based on presmoothing by first constructing a nonparametric estimator and then projecting it on the desired parametric class. We investigate the theoretical properties of the resulting estimator and show through an extensive simulation study for the logistic‐Cox model that it outperforms the existing method. Practical use of …

Marc G Genton

Marc G Genton

King Abdullah University of Science and Technology

Scandinavian Journal of Statistics

Corrigendum to “Shannon Entropy and Mutual Information for Multivariate Skew‐Elliptical Distributions” published in Scandinavian Journal of Statistics (2013), vol. 40, pp. 42–62

We thank Florian Stijven, Ariel Alonso Abad, and Gökçe Deliorman for pointing out typos in the signs of the formula for the mutual information index for the Student's t case (p. 47, last formula of Section 2.4). The correct formula is:I XY T n+ m (Ω, ν)= I XY N n+ m (Ω)+ log Γ (ν/2) Γ {(ν+ n+ m)/2} Γ {(ν+ n)/2} Γ {(ν+ m)/2}+ ν+ m 2 ψ ν+ m 2+ ν+ n 2 ψ ν+ n 2− ν+ n+ m 2 ψ ν+ n+ m 2− ν 2 ψ ν 2. $${\displaystyle\begin {array}{ll}\hfill {I} _ {\mathbf {XY}}^{T_ {n+ m}}\left (\boldsymbol {\Omega},\nu\right) &={I} _ {\mathbf {XY}}^{N_ {n+ m}}\left (\boldsymbol {\Omega}\right)+\log\left [\frac {\Gamma\left (\nu/2\right)\Gamma\left\{\left (\nu+ n+ m\right)/2\right\}}{\Gamma\left\{\left (\nu+ n\right)/2\right\}\Gamma\left\{\left (\nu+ m\right)/2\right\}}\right]+\frac {\nu+ m}{2}\psi\left (\frac {\nu+ m}{2}\right)\\{}\hfill &\kern1em+\frac {\nu+ n}{2}\psi\left (\frac {\nu+ n}{2}\right)-\frac {\nu+ n+ m}{2}\psi\left (\frac {\nu+ n+ m}{2}\right)-\frac {\nu}{2}\psi\left (\frac {\nu}{2 …

Dongsheng Tu

Dongsheng Tu

Queen's University

Scandinavian Journal of Statistics

Consistent covariances estimation for stratum imbalances under minimization method for covariate‐adaptive randomization

Pocock and Simon's minimization method is a popular approach for covariate‐adaptive randomization in clinical trials. Valid statistical inference with data collected under the minimization method requires the knowledge of the limiting covariance matrix of within‐stratum imbalances, whose existence is only recently established. In this work, we propose a bootstrap‐based estimator for this limit and establish its consistency, in particular, by Le Cam's third lemma. As an application, we consider in simulation studies adjustments to existing robust tests for treatment effects with survival data by the proposed estimator. It shows that the adjusted tests achieve a size close to the nominal level, and unlike other designs, the robust tests without adjustment may have an asymptotic size inflation issue under the minimization method.

Glen McGee

Glen McGee

University of Waterloo

Scandinavian Journal of Statistics

Marginal additive models for population‐averaged inference in longitudinal and cluster‐correlated data

We propose a novel marginal additive model (MAM) for modeling cluster‐correlated data with nonlinear population‐averaged associations. The proposed MAM is a unified framework for estimation and uncertainty quantification of a marginal mean model, combined with inference for between‐cluster variability and cluster‐specific prediction. We propose a fitting algorithm that enables efficient computation of standard errors and corrects for estimation of penalty terms. We demonstrate the proposed methods in simulations and in application to (a) a longitudinal study of beaver foraging behavior and (b) a spatial analysis of Loa loa infection in West Africa.

Samuel Muller

Samuel Muller

Macquarie University

Scandinavian Journal of Statistics

The effect of the working correlation on fitting models to longitudinal data

We present a detailed discussion of the theoretical properties of quadratic inference function estimators of the parameters in marginal linear regression models. We consider the effect of the choice of working correlation on fundamental questions including the existence of quadratic inference function estimators, their relationship with generalized estimating equations estimators, and the robustness and asymptotic relative efficiency of quadratic inference function and generalized estimating equations estimators. We show that the quadratic inference function estimators do not always exist and propose a way to handle this. We then show that they have unbounded influence functions and can be more or less asymptotically efficient than generalized estimating equations estimators. We also present empirical evidence to demonstrate these results. We conclude that the choice of working correlation can have surprisingly …