Seminario de Estudiantes de Doctorado en Estadística

El propósito de estos seminarios es conocer los proyectos de investigación en los que han participado los y las estudiantes del programa de Doctorado en Estadística en la modalidad de ponencia. Se extiende la invitación a participar a toda la comunidad UC.


2025-04-22
16:10hrs.
Luz Marina Ramos Quispe. Pontificia Universidad Católica de Chile
Scale mixture of a multivariate normal distribution with a Birnbaum-Saunders mixing distribution
Sala 3, Facultad de Matemáticas
Abstract:
In the search for multivariate distributions that provide greater flexibility in modeling data characterized by high levels of skewness, kurtosis, and the presence of outliers, new families of multivariate distributions have emerged, among which multivariate normal mixture distributions stand out. In this context, we introduce a multivariate normal mixture distribution based on the Birnbaum-Saunders distribution and examine some of its key properties. To estimate the parameters of this normal scale mixture distribution, we propose a maximum likelihood approach implemented via the EM algorithm. To support inferential analyses, we derive the Fisher information matrix. Additionally, we formulate a linear hypothesis on the parameter vector of interest and evaluate it using the likelihood ratio, Wald, score, and gradient statistics. Finally, we illustrate the application of the proposed methodology to real datasets, complementing the analysis with a simulation study to assess its performance.
2025-04-15
16:10hrs.
Martial Toniotti. Lidam/core, Uclouvain
Investments in renewable energy using portfolio optimization
Sala 3, Facultad de Matemáticas
Abstract:
Regulators' procurement of renewable energy capacity is rapidly expanding. While stochastic optimization methods are typically employed to determine the optimal total capacity to procure, a smaller body of research explores an alternative framework borrowed from finance: portfolio optimization. In this study, we apply portfolio optimization to renewable energy , aiming to identify optimal portfolios that balance two objectives: maximizing energy production per dollar invested and minimizing the variance. We utilize principal component analysis (PCA) and other techniques to identify these portfolios under limited sample size. The proposed method is tested using historical Belgian production data spanning five years, with out-of-sample comparisons evaluating portfolio performance under real-world conditions. 
2025-04-08
16:10hrs.
Bladimir Morales Torrez. Pontificia Universidad Católica de Chile
Continuous positive (non)-Gaussian random fields with zero - inflated: A block and pairwise likelihood approach
Sala 3, Facultad de Matemáticas
Abstract:
Technological advances have transformed data collection and analysis, enabling the acquisition of large volumes of real-time information, commonly referred to as big data. In sectors such as the fishing industry, the adoption of modern technologies has introduced new challenges due to the excess of zeros in catch records, reflecting the natural variability in species abundance. In agriculture, satellite imagery has revolutionized crop monitoring, improving decisions related to plant health, resource management, and yield forecasting. Similarly, environmental monitoring using these technologies facilitates tracking of climate change and pollution, which is crucial for public health and sustainability. To address these issues, statistical models must account for spatial and spatio-temporal dependencies in the data, as well as the possibility of zero-inflation. In the literature, both Gaussian and (non)-Gaussian models have been developed for continuous or discrete data structures, but the excess zeros present significant challenges when modeling random fields. Techniques such as logarithmic transformations or constant adjustments have been proposed, though these often distort the data structure or are not feasible. Additionally, large-scale datasets present computational difficulties, as the high cost of likelihood methods often proves prohibitive. To mitigate this, methods such as composite likelihood have been employed, balancing statistical accuracy with computational efficiency in estimation. Furthermore, the concept of effective sample size (ESS) is essential for quantifying the information content in spatial datasets, addressing redundancy issues arising from spatial correlation.
 
The goal of this research is to propose a new class of continuous spatial and spatio-temporal (non)-Gaussian random fields with positive support and excess zeros. The study develops a hybrid composite likelihood function that combines block likelihood and pairwise likelihood methods to efficiently handle large-scale data estimation, while also aiding in the generation of the proposed random fields from bivariate distributions. Additionally, the effective sample size (ESS) will be defined within the context of this new class of random fields, with particular attention to assessing its asymptotic normality. The proposed methodology will be validated through simulations and comparisons with existing techniques. This work contributes to the advancement of statistical models for high-dimensional spatial and spatio-temporal data with excess zeros, providing an important tool for spatial data analysis in complex real-world scenarios.
2025-04-01
16:10hrs.
Cristian Capetillo Constela. Pontificia Universidad Católica de Chile
Bayesian Nonparametric Regression and Model Selection for Discrete data (Thesis project)
Sala 3, Facultad de Matemáticas
Abstract:
Bayesian nonparametric (BNP) theory is well-developed for continuous random variables. For discrete data, the main BNP references fall into the Dirichlet Process (DP) or a Poisson DP mixture. DP does not allow smooth deviation from its base measure, while a Poisson mixture will never be able to fit under-dispersed data. However, assuming the existence of a continuous underlying variable can help transfer the continuous theory to a discrete one. Under this approach, the current project deals with developing a flexible regression model endowed with a model selection feature to identify the most relevant structure in context of binary, ordinal, and count data. Particularly, this project has three specific goals: 1) develop a latent dependent DP mixture model for light-tailed discrete data, 2) develop a latent dependent NGGP mixture model for heavy-tailed data, and 3) extend the two models to a multivariate case. It is hoped that the models will find the true model and fit better than those common in the literature when the sample size increases. This should hold for datasets with zero-inflated and under, equi, or over-dispersed behaviors.
2025-03-25
16:10hrs.
Daniel Alejandro Saavedra Morales. Pontificia Universidad Católica de Chile
A Bayesian Approach Model Selection for Heavy-Tailed Data.
Sala 3, Facultad de Matemáticas
Abstract:
Heavy-tailed distributions have been a subject of study for a long time due to their numerous applications in various fields, such as economics, natural disasters, signals, and social sciences. In particular, there is extensive research on power-law distributions ($p(x) \propto x^{\alpha}$) and their generalization, regularly varying functions ($\mathcal{RV}_\alpha$), which behave approximately like a power-law in the tail of the distribution.

Although multiple approaches have been developed to study tail behavior in both univariate and multivariate data, as well as in the presence of regressors, many of these studies tend to set an arbitrary threshold or percentile from which the fitting process begins. This can result in a loss of information contained in the body of the distribution. On the other hand, some research uses all observed data to estimate heavy-tailed densities, particularly under Bayesian approaches. However, these models tend to be complex to handle, especially when model selection is required.
  
This project has two main objectives. The first is to propose Bayesian model selection in flexible regression models for heavy-tailed distributions $\mathcal{RV}_\alpha$, using a simple yet flexible model such as the Gaussian mixture model under a dependent Dirichlet process (DDP-GMM), in the logarithmic space of the observations, where $\mathcal{RV}_\alpha$ distributions become light-tailed. This approach facilitates model selection through a Spike and Slab methodology, as it allows for the analytical computation of the marginal likelihood.
 
The second objective is to develop a model selection strategy using flexible regression for heavy-tailed $\mathcal{RV}_\alpha$ data. To achieve this, a Bayesian quantile regression will be proposed for both low and high percentiles, with errors distributed according to an asymmetric Laplace mixture under a normalized generalized gamma (NGG) process on the scale parameters. A Spike and Slab methodology will be employed for model selection, enabling the analysis of relevant regressors for the quantiles in the tails of the distribution.
2024-12-03
16:10hrs.
Daniel Alejandro Saavedra Morales. Pontificia Universidad Católica de Chile
A Bayesian Approach Model Selection for Heavy-Tailed Data.
Sala 1, edificio Rolando Chuaqui
Abstract:

Heavy-tailed distributions have been a subject of study for a long time due to their numerous applications in various fields, such as economics, natural disasters, signals, and social sciences. In particular, there is extensive research on power-law distributions ($p(x) \propto x^{\alpha}$) and their generalization, regularly varying functions ($\mathcal{RV}_\alpha$), which behave approximately like a power-law in the tail of the distribution.

 

Although multiple approaches have been developed to study tail behavior in both univariate and multivariate data, as well as in the presence of regressors, many of these studies tend to set an arbitrary threshold or percentile from which the fitting process begins. This can result in a loss of information contained in the body of the distribution. On the other hand, some research uses all observed data to estimate heavy-tailed densities, particularly under Bayesian approaches. However, these models tend to be complex to handle, especially when model selection is required.

 

This project has two main objectives. The first is to propose Bayesian model selection in flexible regression models for heavy-tailed distributions $\mathcal{RV}_\alpha$, using a simple yet flexible model such as the Gaussian mixture model under a dependent Dirichlet process (DDP-GMM), in the logarithmic space of the observations, where $\mathcal{RV}_\alpha$ distributions become light-tailed. This approach facilitates model selection through a Spike and Slab methodology, as it allows for the analytical computation of the marginal likelihood.

 

The second objective is to develop a model selection strategy using flexible regression for heavy-tailed $\mathcal{RV}_\alpha$ data. To achieve this, a Bayesian quantile regression will be proposed for both low and high percentiles, with errors distributed according to an asymmetric Laplace mixture under a normalized generalized gamma (NGG) process on the scale parameters. A Spike and Slab methodology will be employed for model selection, enabling the analysis of relevant regressors for the quantiles in the tails of the distribution.

2024-11-19
16:10hrs.
Ingrid Guevara Romero. Pontificia Universidad Católica de Chile
Bayesian model selection for circular data analysis
Sala 1, edificio Rolando Chuaqui
Abstract:

Circular measurements result from sources like clocks, calendars, or compass directions. Developing statistical models for circular responses is essential for addressing diverse applications, including wind directions in meteorology, patient arrival times at healthcare facilities, animal navigation in biology, and periodic data in political science. As circular data may be mishandled by models that do not account for its cyclical nature, there have been some approaches to developing methodologies that accurately describe its behavior. Unfortunately, there is limited literature on regression models within this context and even fewer resources addressing model selection. This presentation introduces a novel Bayesian nonparametric regression model for circular data that contemplates model selection. The proposal uses a mixture of Dirichlet processes with a Projected Normal distribution and discrete spike-and-slab priors for the model selection framework. The methodology is validated through a simulation study and a practical example.

2024-11-05
16:10hrs.
Cristian Capetillo Constela. Pontificia Universidad Católica de Chile
Regresión BNP para datos discretos y selección de modelos
Facultad de Matemáticas, Edificio Rolando Chuaqui, Sala 1.
Abstract:

En toda área del conocimiento existe un interés particular en los datos del tipo discreto. Se podrían mencionar fácilmente datos como la frecuencia de eventos sísmicos, el número de productos vendidos por una tienda, el número de cigarrillos fumados por persona y el número de automóviles en una intersección, cada uno relacionado con la geología, la economía, la medicina o la planificación urbana, respectivamente.

 

En el contexto de los modelos paramétricos, el primer modelo para datos discretos, en particular de conteo, es el popular modelo de Poisson. Tal popularidad, lamentablemente, viene acompañada de su característica restrictiva de equidispersión. Alternativas al modelo de Poisson son la distribución Binomial-Negativa o versiones cero-infladas tales como los modelos ZIP y ZINB (véase, por ejemplo, Agresti, 2002). Sin embargo, la naturaleza restrictiva de los modelos paramétricos es bien conocida. Con un espacio de parámetros de dimensión finita, se podría caer en un problema de especificación. Más aún, un modelo paramétrico puede verse como caso particular de uno no paramétrico (Ghosal y van der Vaart, 2017).

 

La teoría Bayesiana No Paramétrica (BNP) está bien desarrollada en el contexto de variables aleatorias continuas. Para datos discretos, la afirmación puede ser al menos discutible. La incorporación de una variable subyacente continua, sin embargo, puede ayudar a transferir la teoría continua a una discreta. En este trabajo se desarrolla un modelo de regresión flexible y una metodología de selección de modelos para datos de tipo discreto utilizando el redondeo de kernels continuos (Canale y Dunson, 2011). En particular, se desarrolla un modelo LDDP redondeado con priori spike-and-slab, dotado de un esquema MCMC para un fácil cálculo a posteriori. El modelo se somete a un estudio preliminar de simulación y se aplica a un conjunto de datos correspondiente al desempeño de un equipo de fútbol a través de los años.

2024-09-03
16:10-17:20hrs.
Ignacio Betancourt Peters. Pontificia Universidad Católica de Chile
Estudio de diversidad beta en Canal Caucahué Chiloé
Escuela Pre-Doctoral FMAT
Abstract:

La diversidad beta, en el campo de la ecología, corresponde al estudio de la composición y abundancia de distintas especies localmente, permitiendo analizar comunidades en ecosistemas. Mediante el uso de técnicas no paramétricas de exploración y comparación de grupos (PERMANOVA) en base a coeficientes de disimilaridad, se exploró la diversidad beta del canal Caucahué, Chiloé, en el contexto de un estudio sobre la comunidad de zooplacton en dicho canal, que abarca centros de cultivo de salmón y mitílidos.

2024-08-19
16.10hrs.
Mauricio Toro Cea. Pontificia Universidad Católica de Chile
Truth, Possibility and Probability: Rolando Chuaqui and The Logical Foundations of Probability and Statistical Inference
Facultad de Matemáticas, Edificio Rolando Chuaqui, Sala 2.
2024-06-25
16:10hrs.
Jesus Achire Quispe. PUC
Some Objective Priors and Their Application to the Ricci Model
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

The Ricci distribution is widely known and used in fields such as magnetic resonance imaging and wireless communications, being particularly useful for describing signal process data. In this work, we propose an objective Bayesian inference, focusing on the Jeffreys prior, the reference prior, and a scoring rule-based prior. We demonstrate the advantages and disadvantages of these priors and compare them with the classical maximum likelihood estimator through simulations. Our results show that Bayesian estimators provide estimates with less bias than classical estimators.

2024-06-04
16:10hrs.
Fabian Gomez. PUC
Análisis de datos funcionales de los niveles de MP2.5 en Santiago, Chile: Inviernos 2018-2022
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

El material particulado fino 2.5 (PM 2.5) es un tipo de partícula nociva para la salud, y su monitoreo tiene como objetivo establecer la calidad del aire que puede tener una región de un país. En este trabajo, se utilizan herramientas de análisis de datos funcionales para analizar la concentración de PM 2,5 durante los periodos invernales del 2018 al 2022 en la estación de monitoreo Parque O'Higgins. El enfoque consiste en un análisis de varianza funcional para estudiar si existen diferencias en las curvas medias de cada invierno, buscando patrones de comportamiento entre los años, en contraste con el Plan de descontaminación actual en Santiago de Chile.

2024-05-07
16:10hrs.
Bryan Andrés Tobar Torres. PUC
Gráficos de control para la detección de fallas en sistemas HVAC, empleando técnicas de detección de anomalías desde el punto de vista del análisis supervisado
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

A partir de los datos obtenidos del monitoreo de sistemas de aire acondicionado, se emplea el algoritmos basados en la densidad de datos para  establecer potenciales fallas en el sistema con la finalidad de poder realizar alertas tempranas en los planes de mantenimiento.

2024-04-23
16:10hrs.
Nixon Jerez Lillo. PUC
Power-law Regression Model with Long-term Survival, Change Point Detection, and Regularization
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

Kidney cancer, a potentially life-threatening malignancy affecting the kidneys, demands early detection and proactive intervention to enhance prognosis and survival. Advancements in medical and health sciences and the emergence of novel treatments are expected to lead to a favorable response in a subset of patients. This, in turn, is anticipated to enhance overall survival and disease-free survival rates. Cure fraction models have become essential for estimating the proportion of individuals considered cured, free from adverse events. This article presents a novel piecewise power-law cure fraction model with a piecewise decreasing hazard function, deviating from the traditional piecewise constant hazard assumption. Through the analysis of real medical data, we evaluate various factors to explain the survival of individuals. Consistently positive outcomes are observed, affirming the significant potential of our approach. Furthermore, we employ a local influence analysis to detect potentially influential individuals and perform a post-deletion analysis to analyze their impact on our inferences.

2024-04-02
16:10hrs.
Daniel Saavedra. PUC
Generate Censored Samples: Controlling the desired censorship percentage
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

Generating censored random samples while controlling the desired percentage of censorship is a critical task when assessing the performance of our model in corresponding simulation studies. In this presentation, we will explore an approach to aboard this challenge, particularly when dealing with random censorship. This method is implemented in the recently available 'rcens' package on CRAN, which also offers functionalities to control the censorship percentage in generated samples with different types of censorship (Types I, II, and III), providing researchers and professionals with a straightforward tool to simulate datasets according to desired distributional needs. Lastly, we will discuss a potential scheme for generating interval censorship also implemented in 'rcens'.

2023-12-07
16:00hrs.
Fabio Paredes Peñaloza. PUC
Competing Risks in Survival Analysis: More the Rule Than the Exception
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

In survival studies, understanding the probability (risk) or occurrence rate of a specific event (hazard) over a set period is often sought. However, in a realistic scenario, multiple competing events can occur. If competing risks are not considered, the risk estimates for a particular event might be biased. The effect of a covariate on the hazard function for a specific cause can be estimated using Cox's proportional hazards model, censoring competing events. Yet, the interpretation in the cumulative incidence function (CIF) is limited, as in the scenario of competing risks, there is no one-to-one relationship between the hazard function and the CIF. The Fine-Gray model enables understanding the effect of a covariate on the CIF. However, an interpretation error often occurs by equating the interpretation of the subdistribution hazard ratio (SHR) and the commonly used hazard ratio (HR). This study aims to clarify the concepts used under competing risks, which can guide us to an appropriate interpretation, and apply them in research on cancer relapse, a scenario where this methodology is particularly interesting.

2023-11-23
16:00hrs.
Francisco Segovia Godoy. PUC
Bayesian model selection for regression models
SALA MULTIUSOS (1er piso), FELIPE VILLANUEVA
Abstract:

Regression analysis aims to explore the relationship between a response variable and predictors. A key aspect of regression analysis is model selection, which allows the researcher to decide which predictors are relevant, considering a parsimony criterion. A standard frequentist strategy is to explore the model space using, for instance, a Stepwise strategy based on some goodness of fit criteria. On the other hand, a popular Bayesian strategy is the spike-and-slab methodology, which assigns a specific prior to predictor coefficients by defining a latent binary vector that will indicate which predictors are relevant. Such a strategy includes a prior over the binary vector to penalize complex models. In this work, we developed a general Bayesian strategy for model selection in a broad range of regression models, using the spike-and-slab strategy and a data augmentation technique. We show that if the likelihood function follows certain conditions, the consistency of the Bayes Factor is guaranteed alongside the availability of closed-form expressions for the posterior distribution. We present regression models based on different choices for the response distribution, providing the necessary details for each model to be implemented alongside a Monte Carlo simulation study. Applications with health data are also discussed.

2023-10-19
16:00 hrshrs.
Ingrid Guevara Romero. PUC
Selección de variables en modelos de regresión para datos circulares utilizando la distribución normal proyectada.
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

A través de los años los datos circulares han tomado relevancia en distintos ámbitos. Estos surgen de varias maneras, por ejemplo, a través de instrumentos de medición como relojes.  No obstante, dada su naturaleza, no se pueden utilizar métodos estándar de medición univariados o multivariados. Por lo que se plantean múltiples desafíos debido a que es necesario definir un modelo estadístico sobre un espacio no-euclidiano, como lo es el círculo o la esfera. 

Dado que la literatura sobre este tema es limitada, en esta ocasión se ha planteado como objetivo desarrollar metodologías de selección de variables mediante un enfoque bayesiano paramétrico en modelos de regresión que involucran datos circulares, asumiendo una distribución normal proyectada. Para ello, se imponen prioris de mezcla en los coeficientes de regresión conocidas como spike-and-slab. Los aspectos computacionales del estudio incluyen la implementación de métodos MCMC para generar muestras de las distribuciones posteriori llevar a cabo inferencias sobre el modelo. Estos procedimientos se ilustran mediante el uso de conjuntos de datos simulados y conjuntos de datos reales.

2023-09-28
16:00 hrshrs.
Nixon Jerez Lillo. PUC
A unification approach in semi-parametric piecewise models
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

Piecewise models are valuable tools for enhancing pattern adjustments in statistical analysis, offering superior fit compared to standard models. The most commonly used piecewise exponential distribution, which assumes a constant hazard rate between changepoints, and therefore may not be realistic in many cases. In this talk, we present a unified approach that introduces a general structure for constructing piecewise models, allowing for different behaviors between the changepoints. The proposed structure yields models that are both easier to understand and interpret, while also providing greater accuracy and flexibility than parametric models. We discuss the mathematical properties of the proposed approach in detail, along with its application to various baseline models. We discuss inference on the model parameters, employing a profiled likelihood approach to estimate both the parameters and changepoint parameters in the model. Additionally, we provide application examples using different datasets to illustrate the effectiveness of the proposed approach.

2023-08-31
16:00 hrshrs.
Hernán Robledo Araya. PUC
Down with latent variables! A geometrical view of psychological and educational measurement
SALA 3, FACULTAD DE MATEMÁTICAS
Abstract:

In psychological and educational measurement, the golden standard for the analysis of tests is Item Response Theory (IRT). In IRT, a statistical model is used and the trait score is represented by a latent variable, which is a non-observable quantity. There are several problems with this approach: the latent variable is not unique and non-psychometricians find it hard to interpret. The common way of dealing with this problem in psychometrics is denial and choosing arbitrarily for a particular latent variable representing the measured trait. In an attempt to rectify the situation, Ramsay (1996) presented a new approach for IRT models that is based on differential geometry. In Ramsay’s proposal, the trait score arises naturally from the model as the distance between two models is measured along a path or arc. This arc length does not have the drawbacks of the latent variable. However, Ramsay’s proposal has not yet been fully developed and therefore did not stick into the psychometric literature. In this project, I will improve Ramsay’s approach to IRT models. A new trait score (called the information arc length) is proposed and its statistical properties are being investigated. In addition, the IRT toolbox of techniques (e.g., equating) is extended under this framework. All procedures developed in this project will be made available in open-access statistical software. The results of this project will lead to an easier-to-interpret and invariant trait score.