DSpace Angular :: Browsing by Author "Quintana, Fernando A."

Browsing by Author "Quintana, Fernando A."

Now showing 1 - 20 of 23

A model-based approach to Bayesian classification with applications to predicting pregnancy outcomes from longitudinal beta-hCG profiles
(OXFORD UNIV PRESS, 2007) De La Cruz Mesia, Rolando; Quintana, Fernando A.
This paper discusses Bayesian statistical methods for the classification of observations into two or more groups based on hierarchical models for nonlinear longitudinal profiles. Parameter estimation for a discriminant model that classifies individuals into distinct predefined groups or populations uses appropriate posterior simulation schemes. The methods are illustrated with data from a study involving 173 pregnant women. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from beta human chorionic gonadotropin data available at early stages of pregnancy.
A new family of slash-distributions with elliptical contours
(ELSEVIER SCIENCE BV, 2007) Gomez, Hector W.; Quintana, Fernando A.; Torres, Francisco J.
We introduce a new family of univariate and multivariate slash-distributions. Our construction is based on elliptical distributions. We define the new family by means of a stochastic representation as the scale mixture of an elliptically distributed random variable with respect to the power,of a U(0, 1) random variable. The same idea is extended to the multivariate case. We study general properties of the resulting families, including their moments. We illustrate special cases of interest, such as Normal, Cauchy, Student-t, Type II Pearson and Kotz-type distributions. (c) 2007 Elsevier B.V. All rights reserved.
A predictive view of Bayesian clustering
(ELSEVIER, 2006) Quintana, Fernando A.
This work considers probability models for partitions of a set of n elements using a predictive approach, i.e., models that are specified in terms of the conditional probability of either joining an already existing cluster or forming a new one. The inherent structure can be motivated by resorting to hierarchical models of either parametric or nonparametric nature. Parametric examples include the product partition models (PPMs) and the model-based approach of Dasgupta and Raftery (J. Amer. Statist. Assoc. 93 (1998) 294), while nonparametric alternatives include the Dirichlet process, and more generally, the species sampling models (SSMs). Under exchangeability, PPMs and SSMs induce the same type of partition structure. The methods are discussed in the context of outlier detection in normal linear regression models and of (univariate) density estimation. (c) 2004 Elsevier B.V. All rights reserved.
A semiparametric Bayesian model for repeatedly repeated binary outcomes
(WILEY-BLACKWELL, 2008) Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.; Relling, Mary V.
We discuss the analysis of data from single-nucleotide polymorphism arrays comparing tumour and normal tissues. The data consist of sequences of indicators for loss of heterozygosity (LOH) and involve three nested levels of repetition: chromosomes for a given patient, regions within chromosomes and single-nucleotide polymorphisms nested within regions. We propose to analyse these data by using a semiparametric model for multilevel repeated binary data. At the top level of the hierarchy we assume a sampling model for the observed binary LOH sequences that arises from a partial exchangeability argument. This implies a mixture of Markov chains model. The mixture is defined with respect to the Markov transition probabilities. We assume a non-parametric prior for the random-mixing measure. The resulting model takes the form of a semiparametric random-effects model with the matrix of transition probabilities being the random effects. The model includes appropriate dependence assumptions for the two remaining levels of the hierarchy, i.e. for regions within chromosomes and for chromosomes within patient. We use the model to identify regions of increased LOH in a data set coming from a study of treatment-related leukaemia in children with an initial cancer diagnostic. The model successfully identifies the desired regions and performs well compared with other available alternatives.
Bayesian first order auto-regressive latent variable models for multiple binary sequences
(SAGE PUBLICATIONS LTD, 2011) Giardina, Federica; Guglielmi, Alessandra; Quintana, Fernando A.; Ruggeri, Fabrizio
Longitudinal clinical trials often collect long sequences of binary data monitoring a disease process over time. Our application is a medical study conducted in the US by the Veterans Administration Cooperative Urological Research Group to assess the effectiveness of a chemotherapy treatment (thiotepa) in preventing recurrence on subjects affected by bladder cancer. We propose a generalized linear model with latent auto-regressive structure for longitudinal binary data following a Bayesian approach. We discuss inference as well as sensitivity to prior choices for the bladder cancer data. We find that there is a significant treatment effect in the sense that treated patients have much smaller predicted recurrence probabilities than placebo patients.
Bayesian modeling using a class of bimodal skew-elliptical distributions
(ELSEVIER SCIENCE BV, 2009) Elal Olivero, David; Gomez, Hector W.; Quintana, Fernando A.
We consider Bayesian inference using an extension of the family of skew-elliptical distributions studied by Azzalini [1985. A class of distributions which includes the normal ones. Scand. J. Statist. Theory and Applications 12 (2), 171-178]. This new class is referred to as bimodal skew-elliptical (BSE) distributions. The elements of the BSE class can take quite different forms. In particular, they can adopt both uni- and bimodal shapes. The bimodal case behaves similarly to mixtures of two symmetric distributions and we compare inference under the BSE family with the specific case of mixtures of two normal distributions. We study the main properties of the general class and illustrate its applications to two problems involving density estimation and linear regression. (C) 2008 Elsevier B.V. All rights reserved.
Clustering and Prediction With Variable Dimension Covariates
(2022) Page, Garritt L.; Quintana, Fernando A.; Muller, Peter
In many applied fields incomplete covariate vectors are commonly encountered. It is well known that this can be problematic when making inference on model parameters, but its impact on prediction performance is less understood. We develop a method based on covariate dependent random partition models that seamlessly handles missing covariates while completely avoiding any type of imputation. The method we develop allows in-sample as well as out-of-sample predictions, even if the missing pattern in the new subjects'incomplete covariate vectorwas not seen in the training data. Any data type, including categorical or continuous covariates are permitted. In simulation studies, the proposed method compares favorably. We illustrate themethod in two application examples. Supplementary materials for this article are available here.
Dependent Modeling of Temporal Sequences of Random Partitions
(2022) Page, Garritt L.; Quintana, Fernando A.; Dahl, David B.
We consider modeling a dependent sequence of random partitions. It is well known in Bayesian non-parametrics that a random measure of discrete type induces a distribution over random partitions. The community has therefore assumed that the best approach to obtain a dependent sequence of random partitions is through modeling dependent random measures. We argue that this approach is problematic and show that the random partition model induced by dependent Bayesian nonparametric priors exhibits counter-intuitive dependence among partitions even though the dependence for the sequence of random probability measures is intuitive. Because of this, we suggest directly modeling the sequence of random partitions when clustering is of principal interest. To this end, we develop a class of dependent random partition models that explicitly models dependence in a sequence of partitions. We derive conditional and marginal properties of the joint partition model and devise computational strategies when employing the method in Bayesian modeling. In the case of temporal dependence, we demonstrate through simulation how the methodology produces partitions that evolve gently and naturally overtime. We further illustrate the utility of the method by applying it to an environmental dataset that exhibits spatio-temporal dependence. Supplemental files for this article are available online.
DISCOVERING INTERACTIONS USING COVARIATE INFORMED RANDOM PARTITION MODELS
(2021) Page, Garritt L.; Quintana, Fernando A.; Rosner, Gary L.
Combination chemotherapy treatment regimens created for patients diagnosed with childhood acute lymphoblastic leukemia have had great success in improving cure rates. Unfortunately, patients prescribed these types of treatment regimens have displayed susceptibility to the onset of osteonecrosis. Some have suggested that this is due to pharmacokinetic interaction between two agents in the treatment regimen (asparaginase and dexamethasone) and other physiological variables. Determining which physiological variables to consider when searching for interactions in scenarios like these, minus a priori guidance, has proved to be a challenging problem, particularly if interactions influence the response distribution in ways beyond shifts in expectation or dispersion only. In this paper we propose an exploratory technique that is able to discover associations between covariates and responses in a general way. The procedure connects covariates to responses flexibly through dependent random partition distributions and then employs machine learning techniques to highlight potential associations found in each cluster. We provide a simulation study to show utility and apply the method to data produced from a study dedicated to learning which physiological predictors influence severity of osteonecrosis multiplicatively.
DPpackage: Bayesian Semi- and Nonparametric Modeling in R
(JOURNAL STATISTICAL SOFTWARE, 2011) Jara, Alejandro; Hanson, Timothy E.; Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.
Data analysis sometimes requires the relaxation of parametric assumptions in order to gain modeling flexibility and robustness against mis-specification of the probability model. In the Bayesian context, this is accomplished by placing a prior distribution on a function space, such as the space of all probability distributions or the space of all regression functions. Unfortunately, posterior distributions ranging over function spaces are highly complex and hence sampling methods play a key role. This paper provides an introduction to a simple, yet comprehensive, set of programs for the implementation of some Bayesian nonparametric and semiparametric models in R, DPpackage. Currently, DPpackage includes models for marginal and conditional density estimation, receiver operating characteristic curve analysis, interval-censored data, binary regression data, item response data, longitudinal and clustered data using generalized linear mixed models, and regression data using generalized additive models. The package also contains functions to compute pseudo-Bayes factors for model comparison and for eliciting the precision parameter of the Dirichlet process prior, and a general purpose Metropolis sampling algorithm. To maximize computational efficiency, the actual sampling for each model is carried out using compiled C, C++ or Fortran code.
Flexible Univariate Continuous Distributions
(INT SOC BAYESIAN ANALYSIS, 2009) Quintana, Fernando A.; Steel, Mark F. J.; Ferreira, Jose T. A. S.
Based on a constructive representation, which distinguishes between a skewing mechanism P and an underlying symmetric distribution F, we introduce two flexible classes of distributions. They are generated by nonparametric modelling of either P or F. We examine properties of these distributions and consider how they can help us to identify which aspects of the data are badly captured by simple symmetric distributions. Within a Bayesian framework, we investigate useful prior settings and conduct inference through MCMC methods. On the basis of simulated and real data examples, we make recommendations for the use of our models in practice. Our models perform well in the context of density estimation using the multimodal galaxy data and for regression modelling with data on the body mass index of athletes.
Multivariate Bayesian discrimination for varietal authentication of Chilean red wine
(TAYLOR & FRANCIS LTD, 2011) Gutierrez, Luis; Quintana, Fernando A.; von Baer, Dietrich; Mardones, Claudia
The process through which food or beverages is verified as complying with its label description is called food authentication. We propose to treat the authentication process as a classification problem. We consider multivariate observations and propose a multivariate Bayesian classifier that extends results from the univariate linear mixed model to the multivariate case. The model allows for correlation between wine samples from the same valley. We apply the proposed model to concentration measurements of nine chemical compounds named anthocyanins in 399 samples of Chilean red wines of the varieties Merlot, Carmenere and Cabernet Sauvignon, vintages 2001-2004. We find satisfactory results, with a misclassification error rate based on a leave-one-out cross-validation approach of about 4%. The multivariate extension can be generally applied to authentication of food and beverages, where it is common to have several dependent measurements per sample unit, and it would not be appropriate to treat these as independent univariate versions of a common model.
MULTIVARIATE BAYESIAN SEMIPARAMETRIC MODELS FOR AUTHENTICATION OF FOOD AND BEVERAGES
(INST MATHEMATICAL STATISTICS, 2011) Gutierrez, Luis; Quintana, Fernando A.
Food and beverage authentication is the process by which foods or beverages are verified as complying with its label description, for example, verifying if the denomination of origin of an olive oil bottle is correct or if the variety of a certain bottle of wine matches its label description. The common way to deal with an authentication process is to measure a number of attributes on samples of food and then use these as input for a classification problem. Our motivation stems from data consisting of measurements of nine chemical compounds denominated Anthocyanins, obtained from samples of Chilean red wines of grape varieties Cabernet Sauvignon, Merlot and Carmenere. We consider a model-based approach to authentication through a semiparametric multivariate hierarchical linear mixed model for the mean responses, and covariance matrices that are specific to the classification categories. Specifically, we propose a model of the ANOVA-DDP type, which takes advantage of the fact that the available covariates are discrete in nature. The results suggest that the model performs well compared to other parametric alternatives. This is also corroborated by application to simulated data.
Nonparametric Bayesian Modeling and Estimation of Spatial Correlation Functions for Global Data
(INT SOC BAYESIAN ANALYSIS, 2021) Porcu, Emilio; Bissiri, Pier Giovanni; Tagle, Felipe; Soza, Ruben; Quintana, Fernando A.
We provide a nonparametric spectral approach to the modeling of correlation functions on spheres. The sequence of Schoenberg coefficients and their associated covariance functions are treated as random rather than assuming a parametric form. We propose a stick-breaking representation for the spectrum, and show that such a choice spans the support of the class of geodesically isotropic covariance functions under uniform convergence. Further, we examine the first order properties of such representation, from which geometric properties can be inferred, in terms of Ho spacing diaeresis lder continuity, of the associated Gaussian random field. The properties of the posterior, in terms of existence, uniqueness, and Lipschitz continuity, are then inspected. Our findings are validated with MCMC simulations and illustrated using a global data set on surface temperatures.
Nonparametric Bayesian modelling using skewed Dirichlet processes
(ELSEVIER, 2009) Iglesias, Pilar L.; Orellana, Yasna; Quintana, Fernando A.
We introduce a new class of discrete random probability measures that extend the definition of Dirichlet process (DP) by explicitly incorporating skewness. The asymmetry is controlled by a single parameter in such a way that symmetric DPs are obtained as a special case of the general construction. We review the main properties of skewed DPs and develop appropriate Polya urn schemes. We illustrate the modelling in the context of linear regression models of the capital asset pricing model (CAPM) type, where assessing symmetry for the error distribution is important to check validity of the model. (C) 2008 Elsevier B.V. All rights reserved.
On the Support of MacEachern's Dependent Dirichlet Processes and Extensions
(INT SOC BAYESIAN ANALYSIS, 2012) Barrientos, Andres F.; Jara, Alejandro; Quintana, Fernando A.
We study the support properties of Dirichlet process-based models for sets of predictor-dependent probability distributions. Exploiting the connection between copulas and stochastic processes, we provide an alternative definition of MacEachern's dependent Dirichlet processes. Based on this definition, we provide sufficient conditions for the full weak support of different versions of the process. In particular, we show that under mild conditions on the copula functions, the version where only the support points or the weights are dependent on predictors have full weak support. In addition, we also characterize the Hellinger and Kullback-Leibler support of mixtures induced by the different versions of the dependent Dirichlet process. A generalization of the results for the general class of dependent stick-breaking processes is also provided.
RANDOM-SET METHODS IDENTIFY DISTINCT ASPECTS OF THE ENRICHMENT SIGNAL IN GENE-SET ANALYSIS
(INST MATHEMATICAL STATISTICS, 2007) Newton, Michael A.; Quintana, Fernando A.; Den Boon, Johan A.; Sengupta, Srikumar; Ahlquist, Paui
A prespecified set of genes may be enriched, to varying degrees, for genes that have altered expression levels relative to two or more states of a cell. Knowing the enrichment of gene sets defined by functional categories. such as gene ontology (GO) annotations, is valuable for analyzing the biological signals in microarray expression data. A common approach to measuring enrichment is by cross-classifying genes according to membership in a functional category and membership oil a selected list of significantly altered genes. A small Fisher's exact test P-value, for example, in this 2 x 2 table is indicative of enrichment. Other category analysis methods retain the quantitative gene-level scores and measure significance by referring a category-level statistic to a permutation distribution associated with the original differential expression problem. We describe a class of random-set scoring methods that measure distinct components of the enrichment signal. The class includes Fisher's test based on selected genes and also tests that average gene-level evidence across the category. Averaging and selection methods are compared empirically using Affymetrix data on expression in nasopharyngeal cancer tissue, and theoretically using a location model of differential, expression. We find that each method has a domain of superiority in the state space of enrichment problems, and that both methods have benefits in practice. Our analysis also addresses two problems related to multiple-category inference, namely, that equally enriched categories are not detected with equal probability if they are of different sizes, and also that there is dependence among category statistics owing to shared genes. Random-set enrichment calculations do not require Monte Carlo for implementation. They are made available in the R package allez.
Semi-parametric Bayesian Inference for Multi-Season Baseball Data
(INT SOC BAYESIAN ANALYSIS, 2008) Quintana, Fernando A.; Mueller, Peter; Rosner, Gary L.; Munsell, Mark
We analyze complete sequences of successes (hits, walks, and sacrifices) for a group of players from the American and National Leagues, collected over 4 seasons. The goal is to describe how players' performance vary from season to season. In particular, we wish to assess and compare the effect of available occasion-specific covariates over seasons. The data are binary sequences for each player and each season. We model dependence in the binary sequence by an autoregressive logistic model. The model includes lagged terms up to a fixed order. For each player and season we introduce a different set of autologistic regression coefficients, i.e., the regression coefficients are random effects that are specific of each season and player. We use a nonparametric approach to define a random effects distribution. The nonparametric model is defined as a mixture with a Dirichlet process prior for the mixing measure. The described model is justified by a representation theorem for order-k exchangeable sequences. Besides the repeated measurements for each season and player, multiple seasons within a given player define an additional level of repeated measurements. We introduce dependence at this level of repeated measurements by relating the season-specific random effects vectors in an autoregressive fashion. We ultimately conclude that while some covariates like the ERA of the opposing pitcher are always relevant, others like an indicator for the game being into the seventh inning may be significant only for certain season, and some others, like the score of the game, can safely be ignored.
Semiparametric Bayesian classification with longitudinal markers
(BLACKWELL PUBLISHING, 2007) De la Cruz Mesia, Rolando; Quintana, Fernando A.; Mueller, Peter
We analyse data from a study involving 173 pregnant women. The data are observed values of the beta human chorionic gonadotropin hormone measured during the first 80 days of gestational age, including from one up to six longitudinal responses for each woman. The main objective in this study is to predict normal versus abnormal pregnancy outcomes from data that are available at the early stages of pregnancy. We achieve the desired classification with a semiparametric hierarchical model. Specifically, we consider a Dirichlet process mixture prior for the distribution of the random effects in each group. The unknown random-effects distributions are allowed to vary across groups but are made dependent by using a design vector to select different features of a single underlying random probability measure. The resulting model is an extension of the dependent Dirichlet process model, with an additional probability model for group classification. The model is shown to perform better than an alternative model which is based on independent Dirichlet processes for the groups. Relevant posterior distributions are summarized by using Markov chain Monte Carlo methods.
Semiparametric Bayesian inference for multilevel repeated measurement data
(WILEY, 2007) Muller, Peter; Quintana, Fernando A.; Rosner, Gary L.
We discuss inference for data with repeated measurements at multiple levels. The motivating example is data with blood counts from cancer patients undergoing multiple cycles of chemotherapy, with days nested within cycles. Some inference questions relate to repeated measurements over days within cycle, while other questions are concerned with the dependence across cycles. When the desired inference relates to both levels of repetition, it becomes important to reflect the data structure in the model. We develop a semiparametric Bayesian modeling approach, restricting attention to two levels of repeated measurements. For the top-level longitudinal sampling model we use random effects to introduce the desired dependence across repeated measurements. We use a nonparametric prior for the random effects distribution. Inference about dependence across second-level repetition is implemented by the clustering implied in the nonparametric random effects model. Practical use of the model requires that the posterior distribution on the latent random effects be reasonably precise.

Browsing by Author "Quintana, Fernando A."

Results Per Page

Sort Options