Seminario de Estudiantes de Doctorado en Estadística

El propósito de estos seminarios es conocer los proyectos de investigación en los que han participado los y las estudiantes del programa de Doctorado en Estadística en la modalidad de ponencia. Se extiende la invitación a participar a toda la comunidad UC.


2025-06-17
16:10hrs.
Ingrid Guevara Romero. Pontificia Universidad Católica de Chile
Bayesian regression and model selection for planar shapes
Sala 3, Facultad de Matemáticas
Abstract:
The interest in analyzing, comparing, and studying shapes is prevalent in various disciplines, such as medicine, biology, and chemistry. Applications range from disease diagnosis and the study of biological variation to facial recognition. Motivated by these broad uses, we present a methodology to model a set of landmarks representing two-dimensional shapes. Our work presents a Bayesian regression framework that incorporates covariates into the modal shape of a complex Watson distribution. Furthermore, we incorporate a spike-and-slab prior distribution for model selection to assess whether relevant factors contribute to shape differences. We demonstrate the utility of the proposed method through an application involving shapes of the midline section of ape skulls, aiming to determine whether cranial shape differences are associated with sex and/or species.
2025-06-10
16:10hrs.
Francisco Antonio Segovia Godoy. Pontificia Universidad Católica de Chile
A Flexible continuous and binary Bayesian model selection method
Sala 3, Facultad de Matemáticas
Abstract:
Regression analysis aims to explore the relationship between a response variable and predictors. A key aspect of regression analysis is variable selection, which enables researchers to identify the most relevant predictors while adhering to the principle of parsimony. The standard frequentist strategy is to explore the model space using, for instance, a Stepwise strategy. Alternatively, a popular Bayesian strategy is the spike-and-slab methodology, which assigns a prior to the predictor coefficients conditional on a latent binary vector that indicates which variables are relevant. This strategy incorporates a prior on the binary vector to penalize overly complex models. This work presents a general Bayesian nonparametric approach to model selection for continuous and binary regression models, employing the spike-and-slab strategy combined with a data augmentation technique. The posterior distribution admits closed-form expressions. Guidelines for the inference implementation are detailed with a supporting Monte Carlo simulation study.
2025-06-03
16:10hrs.
Cristian Capetillo Constela. Pontificia Universidad Católica de Chile
Introducción a los métodos de Inferencia Variacional: una alternativa a MCMC
Sala 3, Facultad de Matemáticas
Abstract:
La inferencia Bayesiana se basa únicamente en la distribución a posteriori ya sea de los parámetros o de observaciones "futuras". Así, una vez calculada esta distribución, la inferencia se realiza mediante funcionales de ella (media, varianza, cuantiles, etc.). Sin embargo, así como aparenta ser una gran ventaja, también conlleva una importante dificultad; y es que en la mayoría de modelos Bayesianos, la distribución a posteriori no está disponible en forma cerrada. Más aún, si lo estuviese, podrían existir funcionales difíciles de calcular. Esta razón ha invitado a vertir innumerables esfuerzos en desarrollar métodos computacionales que permitan aproximar la distribución a posteriori.
 
Si bien tales esfuerzos se han concentrado mayoritariamente en los métodos MCMC (Markov Chain Monte Carlo), existen muchas otras estrategias de aproximar la distribución a posteriori. En esta discusión nos enfocaremos en el artículo "Variational Inference: A review for Statisticians" escrito por Blei, Kucukelbir y McAuliffe en el año 2017, el cuál trata, como su nombre lo indica, de una revisión de los métodos de Inferencia Variacional (VI, por sus siglas en inglés), una alternativa para el cálculo aproximado de la distribución a posteriori. El artículo expone las ideas fundamentales detrás de estos métodos, realizando una aplicación ilustrativa en un modelo de Mezclas Gaussianas, así como aplicaciones realizadas en la literatura reciente, extensiones, resultados teóricos y algunos problemas abiertos sobre estos métodos.
 
La invitación es a discutir y reflexionar sobre los métodos de Inferencia Variacional, y en general sobre aquellos enfoques que buscan aproximar distribuciones, tomando este artículo como punto de referencia.
2025-05-27
16:10hrs.
Jesus Enrique Achire Quispe. Pontificia Universidad Católica de Chile
Objective Bayesian inference for the Exponential-Logarithmic distribution
Sala 3, Facultad de Matemáticas
Abstract:
This study explores an objective Bayesian inference approach for parameter estimation in the Exponential-Logarithmic (EL) distribution. Initially, we establish the necessary and sufficient conditions under which improper priors yield proper posterior distributions for the EL distribution. Additionally, we provide sufficient conditions to ensure the finiteness of posterior moments. These theoretical results are specifically applied to Jeffreys' prior, the maximal data information prior, and reference priors, demonstrating that such improper priors indeed generate proper posterior distributions. To assess the impact of the proposed priors on posterior estimation, we employ Markov Chain Monte Carlo methods and conduct extensive numerical simulations, comparing Bayesian estimators with the maximum likelihood estimators in terms of bias, mean squared error, and coverage probability.
2025-05-20
16:10hrs.
Mauricio Alejandro Toro Cea. Pontificia Universidad Católica de Chile
Causal inference and logically possible worlds: Partial Identification and statistical methods for numerical outcomes in finite populations
Sala 3, Facultad de Matemáticas
Abstract:
This work proposes a new methodological framework for causal inference. The perspective is based on the assumption of a finite sample space, where the data correspond to the population of interest or to a subset of statistical units from a larger finite population, without assuming the existence of an underlying data generation process. A formalization of Partial Identification is proposed, distinct from Manski’s approach, based on logically possible worlds. Additionally, causal inference methods are proposed on the probability space of the set of logically possible counterfactual worlds. The proposed statistical methods, based on a logicist interpretation of probability, are valid for both dichotomous and numerical outcomes, without the need to assume continuity in the outcomes. The methodology is illustrated by analyzing data on the impact of minimum wage increases on unemployment rates.
2025-05-13
16:10hrs.
Jose Alejandro Ordoñez Cuastumal. Pontificia Universidad Católica de Chile
Penalized complexity priors for the skewness parameter of power links
Sala 3, Facultad de Matemáticas
Abstract:

The choice of a prior distribution is a key aspect of the Bayesian method. However, in many cases, such as the family of power links, this is not trivial. In this article, we introduce a penalized complexity prior (PC prior) of the skewness parameter for this family, which is useful for dealing with imbalanced data. We derive a general expression for this density and show its usefulness for some particular cases such as the power logit and the power probit links. A simulation study and a real data application are used to assess the efficiency of the introduced densities in comparison with the Gaussian and uniform priors. Results show improvement in point and credible interval estimation for the considered models when using the PC prior in comparison to other well-known standard priors.

2025-05-06
16:10hrs.
Nixon Andrés Jerez Lillo. Pontificia Universidad Católica de Chile
Beyond the Power Law: Estimation, Goodness-of-Fit, and a Semiparametric Extension in Complex Networks
Sala 1, edificio Rolando Chuaqui
Abstract:

Scale-free networks play a fundamental role in the study of complex networks and various applied fields due to their ability to model a wide range of real-world systems. A key characteristic of these networks is their degree distribution, which often follows a power-law distribution, where the probability mass function is proportional to $x^{-\alpha}$, with $\alpha$ typically ranging between $2 < \alpha < 3$. In this talk, we introduce Bayesian inference methods to obtain more accurate estimates than those obtained using traditional methods, which often yield biased estimates, and precise credible intervals. Through a simulation study, we demonstrate that our approach provides nearly unbiased estimates for the scaling parameter, enhancing the reliability of inferences. We also evaluate new goodness-of-fit tests to improve the effectiveness of the Kolmogorov-Smirnov test, commonly used for this purpose. Our findings show that the Watson test offers superior power while maintaining a controlled type I error rate, enabling us to better determine whether data adheres to a power-law distribution. Finally, we propose a piecewise extension of this model to provide greater flexibility, evaluating the estimation and its goodness-of-fit features as well. In the complex networks field, this extension allows us to model the full degree distribution, instead of just focusing on the tail, as is commonly done. We demonstrate the utility of these novel methods through applications to two real-world datasets, showcasing their practical relevance and potential to advance the analysis of power-law behavior.

2025-04-22
16:10hrs.
Luz Marina Ramos Quispe. Pontificia Universidad Católica de Chile
Scale mixture of a multivariate normal distribution with a Birnbaum-Saunders mixing distribution
Sala 3, Facultad de Matemáticas
Abstract:

In the search for multivariate distributions that provide greater flexibility in modeling data characterized by high levels of skewness, kurtosis, and the presence of outliers, new families of multivariate distributions have emerged, among which multivariate normal mixture distributions stand out. In this context, we introduce a multivariate normal mixture distribution based on the Birnbaum-Saunders distribution and examine some of its key properties. To estimate the parameters of this normal scale mixture distribution, we propose a maximum likelihood approach implemented via the EM algorithm. To support inferential analyses, we derive the Fisher information matrix. Additionally, we formulate a linear hypothesis on the parameter vector of interest and evaluate it using the likelihood ratio, Wald, score, and gradient statistics. Finally, we illustrate the application of the proposed methodology to real datasets, complementing the analysis with a simulation study to assess its performance.

2025-04-15
16:10hrs.
Martial Toniotti. Lidam/core, Uclouvain
Investments in renewable energy using portfolio optimization
Sala 3, Facultad de Matemáticas
Abstract:

Regulators' procurement of renewable energy capacity is rapidly expanding. While stochastic optimization methods are typically employed to determine the optimal total capacity to procure, a smaller body of research explores an alternative framework borrowed from finance: portfolio optimization. In this study, we apply portfolio optimization to renewable energy , aiming to identify optimal portfolios that balance two objectives: maximizing energy production per dollar invested and minimizing the variance. We utilize principal component analysis (PCA) and other techniques to identify these portfolios under limited sample size. The proposed method is tested using historical Belgian production data spanning five years, with out-of-sample comparisons evaluating portfolio performance under real-world conditions. 

2025-04-08
16:10hrs.
Bladimir Morales Torrez. Pontificia Universidad Católica de Chile
Continuous positive (non)-Gaussian random fields with zero - inflated: A block and pairwise likelihood approach
Sala 3, Facultad de Matemáticas
Abstract:

Technological advances have transformed data collection and analysis, enabling the acquisition of large volumes of real-time information, commonly referred to as big data. In sectors such as the fishing industry, the adoption of modern technologies has introduced new challenges due to the excess of zeros in catch records, reflecting the natural variability in species abundance. In agriculture, satellite imagery has revolutionized crop monitoring, improving decisions related to plant health, resource management, and yield forecasting. Similarly, environmental monitoring using these technologies facilitates tracking of climate change and pollution, which is crucial for public health and sustainability. To address these issues, statistical models must account for spatial and spatio-temporal dependencies in the data, as well as the possibility of zero-inflation. In the literature, both Gaussian and (non)-Gaussian models have been developed for continuous or discrete data structures, but the excess zeros present significant challenges when modeling random fields. Techniques such as logarithmic transformations or constant adjustments have been proposed, though these often distort the data structure or are not feasible. Additionally, large-scale datasets present computational difficulties, as the high cost of likelihood methods often proves prohibitive. To mitigate this, methods such as composite likelihood have been employed, balancing statistical accuracy with computational efficiency in estimation. Furthermore, the concept of effective sample size (ESS) is essential for quantifying the information content in spatial datasets, addressing redundancy issues arising from spatial correlation.

 

The goal of this research is to propose a new class of continuous spatial and spatio-temporal (non)-Gaussian random fields with positive support and excess zeros. The study develops a hybrid composite likelihood function that combines block likelihood and pairwise likelihood methods to efficiently handle large-scale data estimation, while also aiding in the generation of the proposed random fields from bivariate distributions. Additionally, the effective sample size (ESS) will be defined within the context of this new class of random fields, with particular attention to assessing its asymptotic normality. The proposed methodology will be validated through simulations and comparisons with existing techniques. This work contributes to the advancement of statistical models for high-dimensional spatial and spatio-temporal data with excess zeros, providing an important tool for spatial data analysis in complex real-world scenarios.

2025-04-01
16:10hrs.
Cristian Capetillo Constela. Pontificia Universidad Católica de Chile
Bayesian Nonparametric Regression and Model Selection for Discrete data (Thesis project)
Sala 3, Facultad de Matemáticas
Abstract:

Bayesian nonparametric (BNP) theory is well-developed for continuous random variables. For discrete data, the main BNP references fall into the Dirichlet Process (DP) or a Poisson DP mixture. DP does not allow smooth deviation from its base measure, while a Poisson mixture will never be able to fit under-dispersed data. However, assuming the existence of a continuous underlying variable can help transfer the continuous theory to a discrete one. Under this approach, the current project deals with developing a flexible regression model endowed with a model selection feature to identify the most relevant structure in context of binary, ordinal, and count data. Particularly, this project has three specific goals: 1) develop a latent dependent DP mixture model for light-tailed discrete data, 2) develop a latent dependent NGGP mixture model for heavy-tailed data, and 3) extend the two models to a multivariate case. It is hoped that the models will find the true model and fit better than those common in the literature when the sample size increases. This should hold for datasets with zero-inflated and under, equi, or over-dispersed behaviors.

2025-03-25
16:10hrs.
Daniel Alejandro Saavedra Morales. Pontificia Universidad Católica de Chile
A Bayesian Approach Model Selection for Heavy-Tailed Data.
Sala 3, Facultad de Matemáticas
Abstract:

Heavy-tailed distributions have been a subject of study for a long time due to their numerous applications in various fields, such as economics, natural disasters, signals, and social sciences. In particular, there is extensive research on power-law distributions ($p(x) \propto x^{\alpha}$) and their generalization, regularly varying functions ($\mathcal{RV}_\alpha$), which behave approximately like a power-law in the tail of the distribution.

 

Although multiple approaches have been developed to study tail behavior in both univariate and multivariate data, as well as in the presence of regressors, many of these studies tend to set an arbitrary threshold or percentile from which the fitting process begins. This can result in a loss of information contained in the body of the distribution. On the other hand, some research uses all observed data to estimate heavy-tailed densities, particularly under Bayesian approaches. However, these models tend to be complex to handle, especially when model selection is required.

 

This project has two main objectives. The first is to propose Bayesian model selection in flexible regression models for heavy-tailed distributions $\mathcal{RV}_\alpha$, using a simple yet flexible model such as the Gaussian mixture model under a dependent Dirichlet process (DDP-GMM), in the logarithmic space of the observations, where $\mathcal{RV}_\alpha$ distributions become light-tailed. This approach facilitates model selection through a Spike and Slab methodology, as it allows for the analytical computation of the marginal likelihood.

 

The second objective is to develop a model selection strategy using flexible regression for heavy-tailed $\mathcal{RV}_\alpha$ data. To achieve this, a Bayesian quantile regression will be proposed for both low and high percentiles, with errors distributed according to an asymmetric Laplace mixture under a normalized generalized gamma (NGG) process on the scale parameters. A Spike and Slab methodology will be employed for model selection, enabling the analysis of relevant regressors for the quantiles in the tails of the distribution.

2024-12-03
16:10hrs.
Daniel Alejandro Saavedra Morales. Pontificia Universidad Católica de Chile
A Bayesian Approach Model Selection for Heavy-Tailed Data.
Sala 1, edificio Rolando Chuaqui
Abstract:

Heavy-tailed distributions have been a subject of study for a long time due to their numerous applications in various fields, such as economics, natural disasters, signals, and social sciences. In particular, there is extensive research on power-law distributions ($p(x) \propto x^{\alpha}$) and their generalization, regularly varying functions ($\mathcal{RV}_\alpha$), which behave approximately like a power-law in the tail of the distribution.

 

Although multiple approaches have been developed to study tail behavior in both univariate and multivariate data, as well as in the presence of regressors, many of these studies tend to set an arbitrary threshold or percentile from which the fitting process begins. This can result in a loss of information contained in the body of the distribution. On the other hand, some research uses all observed data to estimate heavy-tailed densities, particularly under Bayesian approaches. However, these models tend to be complex to handle, especially when model selection is required.

 

This project has two main objectives. The first is to propose Bayesian model selection in flexible regression models for heavy-tailed distributions $\mathcal{RV}_\alpha$, using a simple yet flexible model such as the Gaussian mixture model under a dependent Dirichlet process (DDP-GMM), in the logarithmic space of the observations, where $\mathcal{RV}_\alpha$ distributions become light-tailed. This approach facilitates model selection through a Spike and Slab methodology, as it allows for the analytical computation of the marginal likelihood.

 

The second objective is to develop a model selection strategy using flexible regression for heavy-tailed $\mathcal{RV}_\alpha$ data. To achieve this, a Bayesian quantile regression will be proposed for both low and high percentiles, with errors distributed according to an asymmetric Laplace mixture under a normalized generalized gamma (NGG) process on the scale parameters. A Spike and Slab methodology will be employed for model selection, enabling the analysis of relevant regressors for the quantiles in the tails of the distribution.

2024-11-19
16:10hrs.
Ingrid Guevara Romero. Pontificia Universidad Católica de Chile
Bayesian model selection for circular data analysis
Sala 1, edificio Rolando Chuaqui
Abstract:

Circular measurements result from sources like clocks, calendars, or compass directions. Developing statistical models for circular responses is essential for addressing diverse applications, including wind directions in meteorology, patient arrival times at healthcare facilities, animal navigation in biology, and periodic data in political science. As circular data may be mishandled by models that do not account for its cyclical nature, there have been some approaches to developing methodologies that accurately describe its behavior. Unfortunately, there is limited literature on regression models within this context and even fewer resources addressing model selection. This presentation introduces a novel Bayesian nonparametric regression model for circular data that contemplates model selection. The proposal uses a mixture of Dirichlet processes with a Projected Normal distribution and discrete spike-and-slab priors for the model selection framework. The methodology is validated through a simulation study and a practical example.

2024-11-05
16:10hrs.
Cristian Capetillo Constela. Pontificia Universidad Católica de Chile
Regresión BNP para datos discretos y selección de modelos
Facultad de Matemáticas, Edificio Rolando Chuaqui, Sala 1.
Abstract:

En toda área del conocimiento existe un interés particular en los datos del tipo discreto. Se podrían mencionar fácilmente datos como la frecuencia de eventos sísmicos, el número de productos vendidos por una tienda, el número de cigarrillos fumados por persona y el número de automóviles en una intersección, cada uno relacionado con la geología, la economía, la medicina o la planificación urbana, respectivamente.

 

En el contexto de los modelos paramétricos, el primer modelo para datos discretos, en particular de conteo, es el popular modelo de Poisson. Tal popularidad, lamentablemente, viene acompañada de su característica restrictiva de equidispersión. Alternativas al modelo de Poisson son la distribución Binomial-Negativa o versiones cero-infladas tales como los modelos ZIP y ZINB (véase, por ejemplo, Agresti, 2002). Sin embargo, la naturaleza restrictiva de los modelos paramétricos es bien conocida. Con un espacio de parámetros de dimensión finita, se podría caer en un problema de especificación. Más aún, un modelo paramétrico puede verse como caso particular de uno no paramétrico (Ghosal y van der Vaart, 2017).

 

La teoría Bayesiana No Paramétrica (BNP) está bien desarrollada en el contexto de variables aleatorias continuas. Para datos discretos, la afirmación puede ser al menos discutible. La incorporación de una variable subyacente continua, sin embargo, puede ayudar a transferir la teoría continua a una discreta. En este trabajo se desarrolla un modelo de regresión flexible y una metodología de selección de modelos para datos de tipo discreto utilizando el redondeo de kernels continuos (Canale y Dunson, 2011). En particular, se desarrolla un modelo LDDP redondeado con priori spike-and-slab, dotado de un esquema MCMC para un fácil cálculo a posteriori. El modelo se somete a un estudio preliminar de simulación y se aplica a un conjunto de datos correspondiente al desempeño de un equipo de fútbol a través de los años.

2024-09-03
16:10-17:20hrs.
Ignacio Betancourt Peters. Pontificia Universidad Católica de Chile
Estudio de diversidad beta en Canal Caucahué Chiloé
Escuela Pre-Doctoral FMAT
Abstract:

La diversidad beta, en el campo de la ecología, corresponde al estudio de la composición y abundancia de distintas especies localmente, permitiendo analizar comunidades en ecosistemas. Mediante el uso de técnicas no paramétricas de exploración y comparación de grupos (PERMANOVA) en base a coeficientes de disimilaridad, se exploró la diversidad beta del canal Caucahué, Chiloé, en el contexto de un estudio sobre la comunidad de zooplacton en dicho canal, que abarca centros de cultivo de salmón y mitílidos.

2024-08-19
16.10hrs.
Mauricio Toro Cea. Pontificia Universidad Católica de Chile
Truth, Possibility and Probability: Rolando Chuaqui and The Logical Foundations of Probability and Statistical Inference
Facultad de Matemáticas, Edificio Rolando Chuaqui, Sala 2.
2024-06-25
16:10hrs.
Jesus Achire Quispe. PUC
Some Objective Priors and Their Application to the Ricci Model
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

The Ricci distribution is widely known and used in fields such as magnetic resonance imaging and wireless communications, being particularly useful for describing signal process data. In this work, we propose an objective Bayesian inference, focusing on the Jeffreys prior, the reference prior, and a scoring rule-based prior. We demonstrate the advantages and disadvantages of these priors and compare them with the classical maximum likelihood estimator through simulations. Our results show that Bayesian estimators provide estimates with less bias than classical estimators.

2024-06-04
16:10hrs.
Fabian Gomez. PUC
Análisis de datos funcionales de los niveles de MP2.5 en Santiago, Chile: Inviernos 2018-2022
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

El material particulado fino 2.5 (PM 2.5) es un tipo de partícula nociva para la salud, y su monitoreo tiene como objetivo establecer la calidad del aire que puede tener una región de un país. En este trabajo, se utilizan herramientas de análisis de datos funcionales para analizar la concentración de PM 2,5 durante los periodos invernales del 2018 al 2022 en la estación de monitoreo Parque O'Higgins. El enfoque consiste en un análisis de varianza funcional para estudiar si existen diferencias en las curvas medias de cada invierno, buscando patrones de comportamiento entre los años, en contraste con el Plan de descontaminación actual en Santiago de Chile.

2024-05-07
16:10hrs.
Bryan Andrés Tobar Torres. PUC
Gráficos de control para la detección de fallas en sistemas HVAC, empleando técnicas de detección de anomalías desde el punto de vista del análisis supervisado
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

A partir de los datos obtenidos del monitoreo de sistemas de aire acondicionado, se emplea el algoritmos basados en la densidad de datos para  establecer potenciales fallas en el sistema con la finalidad de poder realizar alertas tempranas en los planes de mantenimiento.