Globally intensity‐reweighted estimators for K‐ and pair correlation functions

Australian & New Zealand Journal of Statistics

Published On 2021/3

We introduce new estimators of the inhomogeneous K‐function and the pair correlation function of a spatial point process as well as the cross K‐function and the cross pair correlation function of a bivariate spatial point process under the assumption of second‐order intensity‐reweighted stationarity. These estimators rely on a ‘global’ normalisation factor which depends on an aggregation of the intensity function, while the existing estimators depend ‘locally’ on the intensity function at the individual observed points. The advantages of our new global estimators over the existing local estimators are demonstrated by theoretical considerations and a simulation study.

Journal

Australian & New Zealand Journal of Statistics

Published On

2021/3

Volume

63

Issue

1

Page

93-118

Authors

Jesper Møller

Jesper Møller

Aalborg Universitet

Position

Professor in Statistics

H-Index(all)

46

H-Index(since 2020)

23

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

Mathematical Statistics

Probability Theory

University Profile Page

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

Position

Professor i statistik

H-Index(all)

33

H-Index(since 2020)

20

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

rumlig statistik

University Profile Page

Thomas Shaw

Thomas Shaw

University of Michigan

Position

PhD Candidate

H-Index(all)

5

H-Index(since 2020)

5

I-10 Index(all)

0

I-10 Index(since 2020)

0

Citation(all)

0

Citation(since 2020)

0

Cited By

0

Research Interests

biophysics

University Profile Page

Other Articles from authors

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

Nature Communications

Nitrogen and Nod factor signaling determine Lotus japonicus root exudate composition and bacterial assembly

Symbiosis with soil-dwelling bacteria that fix atmospheric nitrogen allows legume plants to grow in nitrogen-depleted soil. Symbiosis impacts the assembly of root microbiota, but it is unknown how the interaction between the legume host and rhizobia impacts the remaining microbiota and whether it depends on nitrogen nutrition. Here, we use plant and bacterial mutants to address the role of Nod factor signaling on Lotus japonicus root microbiota assembly. We find that Nod factors are produced by symbionts to activate Nod factor signaling in the host and that this modulates the root exudate profile and the assembly of a symbiotic root microbiota. Lotus plants with different symbiotic abilities, grown in unfertilized or nitrate-supplemented soils, display three nitrogen-dependent nutritional states: starved, symbiotic, or inorganic. We find that root and rhizosphere microbiomes associated with these states differ in …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

arXiv preprint arXiv:2402.12548

Composite likelihood inference for space-time point processes

The dynamics of a rain forest is extremely complex involving births, deaths and growth of trees with complex interactions between trees, animals, climate, and environment. We consider the patterns of recruits (new trees) and dead trees between rain forest censuses. For a current census we specify regression models for the conditional intensity of recruits and the conditional probabilities of death given the current trees and spatial covariates. We estimate regression parameters using conditional composite likelihood functions that only involve the conditional first order properties of the data. When constructing assumption lean estimators of covariance matrices of parameter estimates we only need mild assumptions of decaying conditional correlations in space while assumptions regarding correlations over time are avoided by exploiting conditional centering of composite likelihood score functions. Time series of point patterns from rain forest censuses are quite short while each point pattern covers a fairly big spatial region. To obtain asymptotic results we therefore use a central limit theorem for the fixed timespan - increasing spatial domain asymptotic setting. This also allows us to handle the challenge of using stochastic covariates constructed from past point patterns. Conveniently, it suffices to impose weak dependence assumptions on the innovations of the space-time process. We investigate the proposed methodology by simulation studies and applications to rain forest data.

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2404.09525

Coupling results and Markovian structures for number representations of continuous random variables

A general setting for nested subdivisions of a bounded real set into intervals defining the digits of a random variable with a probability density function is considered. Under the weak condition that is almost everywhere lower semi-continuous, a coupling between and a non-negative integer-valued random variable is established so that have an interpretation as the ``sufficient digits'', since the distribution of conditioned on does not depend on . Adding a condition about a Markovian structure of the lengths of the intervals in the nested subdivisions, becomes a Markov chain of a certain order . If then are IID with a known distribution. When and the Markov chain is uniformly geometric ergodic, a coupling is established between and a random time so that the chain after time is stationary and follows a simple known distribution. The results are related to several examples of number representations generated by a dynamical system, including base- expansions, generalized L\"uroth series, -expansions, and continued fraction representations. The importance of the results and some suggestions and open problems for future research are discussed.

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2404.08387

The asymptotic distribution of the scaled remainder for pseudo golden ratio expansions of a continuous random variable

Let be the base- expansion of a continuous random variable on the unit interval where is the positive solution to for an integer (i.e., is a generalization of the golden mean for which ). We study the asymptotic distribution and convergence rate of the scaled remainder when tends to infinity.

Jesper Møller

Jesper Møller

Aalborg Universitet

Methodology and Computing in Applied Probability

How many digits are needed?

Let be the digits in the base-q expansion of a random variable X defined on [0, 1) where is an integer. For , we study the probability distribution of the (scaled) remainder : If X has an absolutely continuous CDF then converges in the total variation metric to the Lebesgue measure on the unit interval. Under weak smoothness conditions we establish first a coupling between X and a non-negative integer valued random variable N so that follows and is independent of , and second exponentially fast convergence of and its PDF . We discuss how many digits are needed and show examples of our results.

Thomas Shaw

Thomas Shaw

University of Michigan

Biophysical Journal

Investigating allosteric regulation of neurotransmitter receptors by membrane domains

Activity at synapses is mediated by neurotransmitter receptors, which are embedded in the post-synaptic membrane. Like all ion channels, these receptors are potentially sensitive to the surrounding membrane because the energy required to change conformations includes a component dedicated to the interfacial energy with the local lipid bilayer. Cell membranes are compositionally heterogeneous, and support both nanoscale dynamic structure and larger stabilized ordered and disordered domains. Previous work from the Veatch lab has demonstrated that treatments with short n-alcohols modulate the stability of membrane domains in giant plasma membrane vesicles by shifting the miscibility transition temperature, T c. Meanwhile, assembled electrophysiology data from literature indicates that similar treatments potentiate GABA A and glycine receptors by an amount that is well predicted by the corresponding …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

Journal of Physics: Conference Series

Climate data for moisture simulations: producing a Danish moisture reference year and comparison with previously used reference year locations

Buildings comprise of complex systems, and various materials in combination. Different materials have different expected service life, and degradation processes start as soon as a building is put into operation. Degradation processes are often accelerated with the presence of moisture. To build robust and moisture safe buildings, hygrothermal simulations are used to predict hygrothermal conditions in constructions, and especially areas of risk. Simulations are useful tools for prediction of moisture accumulation, and results can further help predict the risk of moisture damage, ie, frost damage or mould growth. The external climate can therefore have a significant impact on the service life of constructions. Due to the lack of a sufficient Danish climate reference year, including precipitation, hygrothermal simulations of Danish constructions are currently performed with either Danish climate data without precipitation …

Thomas Shaw

Thomas Shaw

University of Michigan

Investigating Spatial Organization and Physicochemical Interactions in Biomembranes: Tools and Insights

The lipid membranes of cells are complex structural and functional landscapes. Beyond being a selective barrier separating the cell from its surroundings, the membrane serves also as a two-dimensional solvent dictating the thermodynamic environment in which membrane protein biochemistry takes place, and as a platform that facilitates and responds to the organization of membrane proteins into functional domains. Membranes including vesicles derived from eukaryotic plasma membranes also exhibit liquid-liquid phase coexistence. This dissertation aims to link the biochemical and organizational properties of membranes to their phase behavior. The membrane's role as a thermodynamic platform is addressed in a chapter on the availability of cholesterol, specifically its chemical potential (Chapter 3). This work consists of measurements of the chemical potential of cholesterol in a family of synthetic lipid membrane compositions. This chemical potential describes the availability of cholesterol, and is a primary determinant of the occupancy of protein binding sites for cholesterol. The synthetic membranes used in this study are similar to mammalian plasma membranes in phase behavior and cholesterol concentration. The measurements show a close connection between the role of cholesterol in phase separation of these membranes and its availability. This finding suggests that treatments that modify the phase behavior of the membrane, of which many are known, may act through their effect on the availability of cholesterol. In addition, this study provides a framework for how to approach other questions about the biochemistry of cholesterol …

Thomas Shaw

Thomas Shaw

University of Michigan

bioRxiv

TorsinA is essential for the timing and localization of neuronal nuclear pore complex biogenesis

Nuclear pore complexes (NPCs) regulate information transfer between the nucleus and cytoplasm. NPC defects are linked to several neurological diseases, but the processes governing NPC biogenesis and spatial organization are poorly understood. Here, we identify a temporal window of strongly upregulated NPC biogenesis during neuronal maturation. We demonstrate that the AAA+ protein torsinA, whose loss of function causes the neurodevelopmental movement disorder DYT-TOR1A (DYT1) dystonia, coordinates NPC spatial organization during this period without impacting total NPC density. Using a new mouse line in which endogenous Nup107 is Halo-Tagged, we find that torsinA is essential for correct localization of NPC formation. In the absence of torsinA, the inner nuclear membrane buds excessively at sites of mislocalized, nascent NPCs, and NPC assembly completion is delayed. Our work implies …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

IEEE Transactions on Antennas and Propagation

Bayesian inference for stochastic multipath Radio Channel models

Stochastic radio channel models based on underlying point processes of multipath components (MPCs) have been studied intensively since the seminal papers of Turin and Saleh–Valenzuela (SV). Despite this, inference regarding parameters of these models has remained a major challenge. Current methods typically have a somewhat ad hoc flavor involving a multitude of steps requiring user specification of tuning parameters. In this article, we propose to instead adopt the principled framework of Bayesian inference to conduct inference for the SV model. The posterior distribution is not analytically tractable and we therefore compute approximations of the posterior using Markov chain Monte Carlo (MCMC) methods specific to point processes. To demonstrate the flexibility of our approach, we additionally propose a new multipath model and apply our inference method to it. The resulting inference methodology is …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

Variable selection for inhomogeneous spatio-temporal Poisson point processes

Spatio-temporal point pattern data are becoming prevalent in many scientific disciplines. We model the first-order intensity of spatio-temporal point pattern data, considering the intensity as a parametric log-linear function of spatial, temporal, and spatio-temporal covariates. Dealing with spatio-temporal covariates brings computational and methodological challenges compared to the purely spatial case. We extend regularisation methods to perform variable selection for spatial point processes to the spatio-temporal case to obtain parsimonious and more interpretable models. Using our proposed methodology, we conduct two simulation studies and examine an application to criminal activity in the Kennedy district of Bogota. In the application, we consider a spatio-temporal point pattern of crime locations and many spatial, temporal, and spatio-temporal covariates related to urban places, environmental factors, and further space-time factors. The intensity function of vehicle thefts is estimated, considering other crimes as covariate information. The proposed methodology offers a comprehensive approach for analysing spatio-temporal point pattern crime data, capturing complex relationships between covariates and crime occurrences over space and time.

Thomas Shaw

Thomas Shaw

University of Michigan

Biophysical Journal

Chemical potential measurements constrain models of cholesterol-phosphatidylcholine interactions

Bilayer membranes composed of cholesterol and phospholipids exhibit diverse forms of nonideal mixing. In particular, many previous studies document macroscopic liquid-liquid phase separation as well as nanometer-scale heterogeneity in membranes of phosphatidylcholine (PC) lipids and cholesterol. Here, we present experimental measurements of cholesterol chemical potential (μc) in binary membranes containing dioleoyl PC (DOPC), 1-palmitoyl-2-oleoyl PC (POPC), or dipalmitoyl PC (DPPC), and in ternary membranes of DOPC and DPPC, referenced to crystalline cholesterol. μc is the thermodynamic quantity that dictates the availability of cholesterol to bind other factors, and notably must be equal between coexisting phases of a phase separated mixture. It is simply related to concentration under conditions of ideal mixing, but is far from ideal for the majority of lipid mixtures investigated here. Measurements of μ …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

arXiv preprint arXiv:2301.08942

A central limit theorem for a sequence of conditionally centered and -mixing random fields

A central limit theorem is established for a sum of random variables belonging to a sequence of random fields. The fields are assumed to have zero mean conditional on the past history and to satisfy certain conditional -mixing conditions in space or time. The limiting normal distribution is obtained for increasing spatial domain or increasing length of the sequence. The applicability of the theorem is demonstrated by examples regarding estimating functions for a space-time point process and a space-time Markov process.

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

The Journal of experimental education

Differences in high-and low-performing students’ fraction learning in the fourth grade

The aim of this longitudinal study was to examine the differences between high- and low-performing students’ development of fraction proficiency over a school year and to investigate how those differences were related to instruction in fractions versus instruction in other mathematics topics. The data for this study were drawn from a group of (n = 398) students from 21 fourth-grade classes, from which we formed two subgroups: the 25% highest performing students (n = 99) and the 25% lowest performing students (n = 100), based on their Danish national test scores. The students’ fraction proficiency levels were studied over eight months at five measurement time points. A multiple linear regression analysis with random effects was used to model the test scores. The results showed that the high-performing students developed their fraction proficiency during most of the school year—both when instructed in …

Thomas Shaw

Thomas Shaw

University of Michigan

Biophysical Journal

Investigating allosteric regulation by membrane domains of two pentameric ligand-gated ion channels

Ion channels are embedded in a compositionally heterogeneous membrane with nanoscale structure on the order of∼ 10-100 nm. Experiments in giant plasma membrane vesicles suggest that one source of this heterogeneity is that the plasma membrane is poised near to but above a miscibility phase transition. Systems near a phase transition can have high susceptibility, meaning that physical properties are especially sensitive to external perturbations. Previous theoretical work in collaboration with Ben Machta has shown that small changes in the phase transition temperature, Tc, can have a strong influence on ion channel function when channels are coupled by boundary interactions to local changes in lipid composition. Meanwhile, data assembled from literature shows that n-alcohol treatments potentiate GABA A and glycine receptors by an amount that is well predicted by the treatment's corresponding shift …

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

bioRxiv

Nitrogen source and Nod factor signaling map out the assemblies of Lotus japonicus root bacterial communities

Symbiosis with soil-dwelling bacteria that fix atmospheric nitrogen allows legume plants to grow in nitrogen-depleted soil. Symbiosis impacts the assembly of root microbiota, but it is not known how this process takes place and whether it is independent of nitrogen nutrition. We use plant and bacterial mutants to address the role of Nod factor signaling on Lotus japonicus root microbiota assembly. We found that Nod factors are produced by symbionts to activate Nod factor signaling in the host, and this modulates the assembly of a symbiotic root microbiota. Lotus plants grown in symbiosis-permissive or suppressive soils delineated three nitrogen-dependent nutritional states: starved, symbiotic, or inorganic. We found that root and rhizosphere microbiomes associated with these states differ in composition and connectivity, demonstrating that symbiosis and inorganic nitrogen impact the legume root microbiota differently. Finally, we demonstrated that selected bacterial genera delineating state-dependent microbiomes have a high level of accurate prediction.

Rasmus Waagepetersen

Rasmus Waagepetersen

Aalborg Universitet

arXiv preprint arXiv:2309.12834

A functional central limit theorem for the K-function with an estimated intensity function

The -function is arguably the most important functional summary statistic for spatial point processes. It is used extensively for goodness-of-fit testing and in connection with minimum contrast estimation for parametric spatial point process models. It is thus pertinent to understand the asymptotic properties of estimates of the -function. In this paper we derive the functional asymptotic distribution for the -function estimator. Contrary to previous papers on functional convergence we consider the case of an inhomogeneous intensity function. We moreover handle the fact that practical -function estimators rely on plugging in an estimate of the intensity function. This removes two serious limitations of the existing literature.

Jesper Møller

Jesper Møller

Aalborg Universitet

arXiv preprint arXiv:2312.09652

The asymptotic distribution of the remainder in a certain base- expansion

Let be the base- expansion of a continuous random variable on the unit interval where is the golden ratio. We study the asymptotic distribution and convergence rate of the scaled remainder when tends to infinity.

2023/12/15

Article Details
Jesper Møller

Jesper Møller

Aalborg Universitet

Proceedings of the London Mathematical Society

Realizability and tameness of fusion systems

A saturated fusion system over a finite p$p$‐group S$S$ is a category whose objects are the subgroups of S$S$ and whose morphisms are injective homomorphisms between the subgroups satisfying certain axioms. A fusion system over S$S$ is realized by a finite group G$G$ if S$S$ is a Sylow p$p$‐subgroup of G$G$ and morphisms in the category are those induced by conjugation in G$G$. One recurrent question in this subject is to find criteria as to whether a given saturated fusion system is realizable or not. One main result in this paper is that a saturated fusion system is realizable if all of its components (in the sense of Aschbacher) are realizable. Another result is that all realizable fusion systems are tame: a finer condition on realizable fusion systems that involves describing automorphisms of a fusion system in terms of those of some group that realizes it. Stated in this way, these results depend on the …

Jesper Møller

Jesper Møller

Aalborg Universitet

ACM Transactions on Spatial Algorithms and Systems

Stochastic Routing with Arrival Windows

Arriving at a destination within a specific time window is important in many transportation settings. For example, trucks may be penalized for early or late arrivals at compact terminals, and early and late arrivals at general practitioners, dentists, and so on, are also discouraged, in part due to COVID. We propose foundations for routing with arrival-window constraints. In a setting where the travel time of a road segment is modeled by a probability distribution, we define two problems where the aim is to find a route from a source to a destination that optimizes or yields a high probability of arriving within a time window while departing as late as possible. In this setting, a core challenge is to enable comparison between paths that may potentially be part of a result path with the goal of determining whether a path is uninteresting and can be disregarded given the existence of another path. We show that existing solutions …

2023/11/21

Article Details

Other articles from Australian & New Zealand Journal of Statistics journal

Wenying Yao

Wenying Yao

Deakin University

Australian & New Zealand Journal of Statistics

Identifying changes in the distribution of income from higher‐order moments with an application to Australia

Changes in the distribution of income over time are identified based on an adjusted two‐sample version of the Neyman smooth test by using subsampling methods to approximate the sampling distribution of the test statistic when samples are not independent of each other. A range of Monte Carlo experiments show that the approach corrects for size distortions arising from dependent samples as well as generating monotonic power functions. Applying the approach to studying the distribution of income in Australia over the business cycle and the Global Financial Crisis, the empirical results highlight the importance of higher‐order moments and demonstrate that business cycles are not all alike as the relative strengths of higher‐order moments vary over phases of the cycle.

Salvatore Ingrassia

Salvatore Ingrassia

Università degli Studi di Catania

Australian & New Zealand Journal of Statistics

Latent heterogeneity in COVID‐19 hospitalisations: a cluster‐weighted approach to analyse mortality

The COVID‐19 pandemic caused an unprecedented excess mortality. Since 2020, many studies have focussed on the characteristics of COVID‐19 patients who did not survive. From the statistical point of view, what seems to dominate is the large heterogeneity of the populations affected by COVID‐19 and the extreme difficulty in identifying subpopulations who died affected by a plurality of contemporary characteristics. In this paper, we propose an extremely flexible approach based on a cluster‐weighted model, which allows us to identify latent groups of patients sharing similar characteristics at the moment of hospitalisation as well as a similar mortality. We focus on one of the hardest hit areas in Italy and study the heterogeneity in the population of patients affected by COVID‐19 using administrative data on hospitalisations in the first wave of the pandemic. Results highlighted that a model‐based clustering …

David Borchers

David Borchers

University of St Andrews

Australian & New Zealand Journal of Statistics

Exact likelihoods for N‐mixture models with time‐to‐detection data

This paper is concerned with the formulation of N$$ N $$‐mixture models for estimating the abundance and probability of detection of a species from binary response, count and time‐to‐detection data. A modelling framework, which encompasses time‐to‐first‐detection within the context of detection/non‐detection and time‐to‐each‐detection and time‐to‐first‐detection within the context of count data, is introduced. Two observation processes which depend on whether or not double counting is assumed to occur are also considered. The main focus of the paper is on the derivation of explicit forms for the likelihoods associated with each of the proposed models. Closed‐form expressions for the likelihoods associated with time‐to‐detection data are new and are developed from the theory of order statistics. A key finding of the study is that, based on the assumption of no double counting, the likelihoods associated with …

res altwegg

res altwegg

University of Cape Town

Australian & New Zealand Journal of Statistics

Exact likelihoods for N‐mixture models with time‐to‐detection data

This paper is concerned with the formulation of N$$ N $$‐mixture models for estimating the abundance and probability of detection of a species from binary response, count and time‐to‐detection data. A modelling framework, which encompasses time‐to‐first‐detection within the context of detection/non‐detection and time‐to‐each‐detection and time‐to‐first‐detection within the context of count data, is introduced. Two observation processes which depend on whether or not double counting is assumed to occur are also considered. The main focus of the paper is on the derivation of explicit forms for the likelihoods associated with each of the proposed models. Closed‐form expressions for the likelihoods associated with time‐to‐detection data are new and are developed from the theory of order statistics. A key finding of the study is that, based on the assumption of no double counting, the likelihoods associated with …

Ashis Chakraborty

Ashis Chakraborty

Indian Statistical Institute

Australian & New Zealand Journal of Statistics

Bayesian neural tree models for nonparametric regression

Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real‐life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical …

David Gunawan

David Gunawan

University of Wollongong

Australian & New Zealand Journal of Statistics

Comparisons of distributions of Australian mental health scores

Bayesian non‐parametric estimates of Australian distributions of mental health scores are obtained to assess how the mental health status of the population has changed over time, and to compare the mental health status of female/male and Aboriginal/non‐Aboriginal population subgroups. First‐order and second‐order stochastic dominance are used to compare distributions, with results presented in terms of the posterior probability of dominance and the posterior probability of no dominance. If a criterion for dominance is satisfied, then, in terms of that criterion, the mental health status of the dominant population is superior to that of the dominated population. If neither distribution is dominant, then the mental health status of neither population is superior in the same sense. Our results suggest mental health has deteriorated in recent years, that males' mental health status is better than that of females, and that non …

Abba Krieger

Abba Krieger

University of Pennsylvania

Australian & New Zealand Journal of Statistics

The role of pairwise matching in experimental design for an incidence outcome

We consider the problem of evaluating designs for a two‐arm randomised experiment with an incidence (binary) outcome under a non‐parametric general response model. Our two main results are that the a priori pair matching design is (1) the optimal design as measured by mean squared error among all block designs which includes complete randomisation. And (2), this pair‐matching design is minimax, that is, it provides the lowest mean squared error under an adversarial response model. Theoretical results are supported by simulations and clinical trial data where we demonstrate the superior performance of pairwise matching designs under realistic conditions.

Duangkamon Chotikapanich

Duangkamon Chotikapanich

Monash University

Australian & New Zealand Journal of Statistics

Comparisons of distributions of Australian mental health scores

Bayesian non‐parametric estimates of Australian distributions of mental health scores are obtained to assess how the mental health status of the population has changed over time, and to compare the mental health status of female/male and Aboriginal/non‐Aboriginal population subgroups. First‐order and second‐order stochastic dominance are used to compare distributions, with results presented in terms of the posterior probability of dominance and the posterior probability of no dominance. If a criterion for dominance is satisfied, then, in terms of that criterion, the mental health status of the dominant population is superior to that of the dominated population. If neither distribution is dominant, then the mental health status of neither population is superior in the same sense. Our results suggest mental health has deteriorated in recent years, that males' mental health status is better than that of females, and that non …

Paul D. McNicholas

Paul D. McNicholas

McMaster University

Australian & New Zealand Journal of Statistics

Visual assessment of matrix‐variate normality

In recent years, the analysis of three‐way data has become ever more prevalent in the literature. It is becoming increasingly common to analyse such data by means of matrix‐variate distributions, the most prevalent of which is the matrix‐variate normal distribution. Although many methods exist for assessing multivariate normality, there is a relative paucity of approaches for assessing matrix‐variate normality. Herein, a new visual method is proposed for assessing matrix‐variate normality by means of a distance–distance plot. In addition, a testing procedure is discussed to be used in tandem with the proposed visual method. The proposed approach is illustrated via simulated data as well as an application on analysing handwritten digits.

Zsuzsa Bakk

Zsuzsa Bakk

Universiteit Leiden

Australian & New Zealand Journal of Statistics

Embedding latent class regression and latent class distal outcome models into cluster‐weighted latent class analysis: a detailed simulation experiment

Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class‐specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster‐weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed …

Yanrong Yang

Yanrong Yang

Australian National University

Australian & New Zealand Journal of Statistics

Robust PCA for high‐dimensional data based on characteristic transformation

In this paper, we propose a novel robust principal component analysis (PCA) for high‐dimensional data in the presence of various heterogeneities, in particular strong tailing and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. The suggested method has the distinct advantage of dealing with heavy‐tail‐distributed data, whose covariances may be non‐existent (positively infinite, for instance), in addition to the usual outliers. The proposed approach is also a case of kernel principal component analysis (KPCA) and employs the robust and non‐linear properties via a bounded and non‐linear kernel function. The merits of the new method are illustrated by some statistical properties, including the upper bound of the excess error and the behaviour of the large eigenvalues under a spiked covariance model. Additionally, using a variety of …

Jun M. Liu

Jun M. Liu

Georgia Southern University

Australian & New Zealand Journal of Statistics

Short-term forecasting with a computationally efficient nonparametric transfer function model

In this paper a semi‐parametric approach is developed to model non‐linear relationships in time series data using polynomial splines. Polynomial splines require very little assumption about the functional form of the underlying relationship, so they are very flexible and can be used to model highly non‐linear relationships. Polynomial splines are also computationally very efficient. The serial correlation in the data is accounted for by modelling the noise as an autoregressive integrated moving average (ARIMA) process, by doing so, the efficiency in nonparametric estimation is improved and correct inferences can be obtained. The explicit structure of the ARIMA model allows the correlation information to be used to improve forecasting performance. An algorithm is developed to automatically select and estimate the polynomial spline model and the ARIMA model through backfitting. This method is applied on a real …

Richard Cook

Richard Cook

University of Waterloo

Australian & New Zealand Journal of Statistics

Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts

High levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, but the exact nature of the dose‐response relationship is less well understood. To investigate this relationship, data were assembled from six longitudinal birth cohort studies examining the effects of PAE on cognitive outcomes from early school age through adolescence. Structural equation models (SEMs) are a natural approach to consider, because of the way they conceptualise multiple observed outcomes as relating to an underlying latent variable of interest, which can then be modelled as a function of exposure and other predictors of interest. However, conventional SEMs could not be fitted in this context because slightly different outcome measures were used in the six studies. In this paper we propose a multi‐group Bayesian SEM that maps the unobserved cognition variable to a broad range of observed outcomes …

Khue-Dung Dang

Khue-Dung Dang

University of Technology

Australian & New Zealand Journal of Statistics

Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts

High levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, but the exact nature of the dose‐response relationship is less well understood. To investigate this relationship, data were assembled from six longitudinal birth cohort studies examining the effects of PAE on cognitive outcomes from early school age through adolescence. Structural equation models (SEMs) are a natural approach to consider, because of the way they conceptualise multiple observed outcomes as relating to an underlying latent variable of interest, which can then be modelled as a function of exposure and other predictors of interest. However, conventional SEMs could not be fitted in this context because slightly different outcome measures were used in the six studies. In this paper we propose a multi‐group Bayesian SEM that maps the unobserved cognition variable to a broad range of observed outcomes …

Brenton R Clarke

Brenton R Clarke

Murdoch University

Australian & New Zealand journal of statistics

Exact testing for heteroscedasticity in a two‐way layout in variety frost trials when incorporating a covariate

Two‐way layouts are common in grain industry research where it is often the case that there are one or more covariates. It is widely recognised that when estimating fixed effect parameters, one should also examine for possible extra error variance structure. An exact test for heteroscedasticity, when there is a covariate, is illustrated for a data set from frost trials in Western Australia. While the general algebra for the test is known, albeit in past literature, there are computational aspects of implementing the test for the two way when there are covariates. In this scenario the test is shown to have greater power than the industry standard, and because of its exact size, is preferable to use of the restricted maximum likelihood ratio test (REMLRT) based on the approximate asymptotic distribution in this instance. Formulation of the exact test considered here involves creation of appropriate contrasts in the experimental design …

Antonio Punzo

Antonio Punzo

Università degli Studi di Catania

Australian & New Zealand Journal of Statistics

Embedding latent class regression and latent class distal outcome models into cluster‐weighted latent class analysis: a detailed simulation experiment

Usually in latent class (LC) analysis, external predictors are taken to be cluster conditional probability predictors (LC models with external predictors), and/or score conditional probability predictors (LC regression models). In such cases, their distribution is not of interest. Class‐specific distribution is of interest in the distal outcome model, when the distribution of the external variables is assumed to depend on LC membership. In this paper, we consider a more general formulation, that embeds both the LC regression and the distal outcome models, as is typically done in cluster‐weighted modelling. This allows us to investigate (1) whether the distribution of the external variables differs across classes, (2) whether there are significant direct effects of the external variables on the indicators, by modelling jointly the relationship between the external and the latent variables. We show the advantages of the proposed …

Tanujit Chakraborty

Tanujit Chakraborty

Indian Statistical Institute

Australian & New Zealand Journal of Statistics

Bayesian neural tree models for nonparametric regression

Frequentist and Bayesian methods differ in many aspects but share some basic optimal properties. In real‐life prediction problems, situations exist in which a model based on one of the above paradigms is preferable depending on some subjective criteria. Nonparametric classification and regression techniques, such as decision trees and neural networks, have both frequentist (classification and regression trees (CARTs) and artificial neural networks) as well as Bayesian counterparts (Bayesian CART and Bayesian neural networks) to learning from data. In this paper, we present two hybrid models combining the Bayesian and frequentist versions of CART and neural networks, which we call the Bayesian neural tree (BNT) models. BNT models can simultaneously perform feature selection and prediction, are highly flexible, and generalise well in settings with limited training observations. We study the statistical …

Tugba Akkaya-Hocagil

Tugba Akkaya-Hocagil

Harvard University

Australian & New Zealand Journal of Statistics

Bayesian modelling of effects of prenatal alcohol exposure on child cognition based on data from multiple cohorts

High levels of prenatal alcohol exposure (PAE) result in significant cognitive deficits in children, but the exact nature of the dose‐response relationship is less well understood. To investigate this relationship, data were assembled from six longitudinal birth cohort studies examining the effects of PAE on cognitive outcomes from early school age through adolescence. Structural equation models (SEMs) are a natural approach to consider, because of the way they conceptualise multiple observed outcomes as relating to an underlying latent variable of interest, which can then be modelled as a function of exposure and other predictors of interest. However, conventional SEMs could not be fitted in this context because slightly different outcome measures were used in the six studies. In this paper we propose a multi‐group Bayesian SEM that maps the unobserved cognition variable to a broad range of observed outcomes …

David Azriel

David Azriel

Technion - Israel Institute of Technology

Australian & New Zealand Journal of Statistics

The role of pairwise matching in experimental design for an incidence outcome

We consider the problem of evaluating designs for a two‐arm randomised experiment with an incidence (binary) outcome under a non‐parametric general response model. Our two main results are that the a priori pair matching design is (1) the optimal design as measured by mean squared error among all block designs which includes complete randomisation. And (2), this pair‐matching design is minimax, that is, it provides the lowest mean squared error under an adversarial response model. Theoretical results are supported by simulations and clinical trial data where we demonstrate the superior performance of pairwise matching designs under realistic conditions.

Linda Haines

Linda Haines

University of Cape Town

Australian & New Zealand Journal of Statistics

Exact likelihoods for N‐mixture models with time‐to‐detection data

This paper is concerned with the formulation of N$$ N $$‐mixture models for estimating the abundance and probability of detection of a species from binary response, count and time‐to‐detection data. A modelling framework, which encompasses time‐to‐first‐detection within the context of detection/non‐detection and time‐to‐each‐detection and time‐to‐first‐detection within the context of count data, is introduced. Two observation processes which depend on whether or not double counting is assumed to occur are also considered. The main focus of the paper is on the derivation of explicit forms for the likelihoods associated with each of the proposed models. Closed‐form expressions for the likelihoods associated with time‐to‐detection data are new and are developed from the theory of order statistics. A key finding of the study is that, based on the assumption of no double counting, the likelihoods associated with …