Coloquio de Estadística y Ciencia de Datos de la Pontificia Universidad Católica de Chile

Presentaremos distintos métodos de estimación no-paramétrica para la densidad con soporte compacto o dominio con geometría que puede ser compleja. Veremos como se fue afectado la calidad de estimadores clásicos al borde del dominio. En particular se propondrá un nuevo estimador basados en polinomios locales para estimar la densidad en un punto x. Veremos que este estimador puede adaptarse a distintas geometrías y que tiene propiedades optímales en términos del error cuadrático medio y de la regularidad de la función a estimar. Compararemos este método al método sparr que es una alternativa popular para la estimación de densidad en dominios a geometría complicada.


2026-01-22
15:30hrs.
Daira Velandia. Universidad de Valparaíso
Estimation methods for a Gaussian process under fixed domain asymptotics
Sala 2
Abstract:
This talk will address some inference tools for Gaussian random fields from the increasing domain and fixed domain asymptotic approaches. First, concepts and previous results are presented. Then, the results obtained after studying some extensions of the problem of estimating covariance parameters under the two asymptotic approaches named above are addressed..
2025-11-13
1.30 pmhrs.
Sally Paganin. Department of Statistics At The Ohio State University
Computational methods for fast Bayesian model assessment via calibrated posterior p-values
Sala multiuso 1
Abstract:
Posterior predictive p-values (ppps) have become popular tools for Bayesian model assessment, being general-purpose and easy to use. However, interpretation can be difficult because their distribution is not uniform under the hypothesis that the model did generate the data. Calibrated ppps (cppps) can be obtained via a bootstrap-like procedure, yet remain unavailable in practice due to high computational cost. This work introduces methods for efficient approximation of cppps and their uncertainty for fast model assessment. The main idea is that, provided that the MCMC chain from the real data has converged, using short MCMC chains per calibration replicate can save significant computation time compared to naive implementations, without significant loss in accuracy. The procedure for cppp is implemented in NIMBLE, a flexible framework for hierarchical modeling that supports many models and discrepancy measures.
2025-11-03
13.30 pmhrs.
Gleici Perdoná. Dep Medicina Social ? Faculdade de Medicina de Ribeirão Preto/usp
On the unification of zero-adjusted cure survival models
Sala de Usos Multiples
Abstract:
This study presents a unified survival model that jointly accounts for cured individuals and zero lifetimes, extending traditional approaches to include competing risks. The proposed zero-adjusted cure model accommodates various distributions for the competing causes, such as Binomial, Geometric, Poisson, and Negative Binomial. A simulation study was conducted to assess the performance of maximum likelihood estimators and asymptotic confidence intervals, demonstrating accurate estimation and improved coverage with increasing sample size. The model was also applied to childbirth duration data from sub-Saharan Africa, where the geometric distribution provided the best fit. Overall, the proposed methodology offers high flexibility for modeling survival data with both zero adjustment and cure proportions, showing strong potential for application across diverse practical contexts.
2025-11-03
14.30 pmhrs.
Nalini Ravishanker. University of Connecticut, Usa
Forecasting Robust Gaussian Process State Space Models for Assessing Intervention Impact in IoT Time Series
Sala de Usos Multiples
Abstract:
This talk describes a robust Gaussian prior process state space modeling approach to assess the impact of an intervention in an IoT stream of internal temperatures measured by an insurance firm to address the risk of pipe-freeze hazard in a building. The robustness is achieved by assuming a scale mixture of normal distributions instead of the normal distribution in order to accommodate heavy-tailed behavior or anomalous observations in the time series. Gaussian process (GP) prior models provide flexibility by placing a non-parametric prior on the functional form of the model. By treating the pipe-freeze hazard alert as an exogenous intervention, we use the robust Gaussian prior process state space model for Bayesian fitting and forecasting the internal temperatures. By comparing the forecasts with future data, we assess with a high level of confidence whether an alerted customer took preventive action to prevent pipe-freeze loss. 
2025-09-30
15 hrshrs.
Arkady Shemyakin. University of St. Thomas
Hellinger Information and Hellinger Priors
Sala de usos múltiples
Abstract:
The concept of Hellinger information as a local characteristic of parametric distribution families was introduced in 2011, based on the definition of Hellinger distance. Under certain regularity conditions, the local behavior of the Hellinger distance is closely connected to Fisher information and the geometry of Riemann manifolds. In non-regular situations when Fisher information is undefined, the Hellinger information may serve as a possible generalization. Hellinger priors extend the Jeffreys rule to irregular cases. For many examples, they exhibit behavior that is identical to or close to the reference or probability-matching priors. In the presentation, the Hellinger information matrix is considered as an extension of Hellinger information to the case of a vector parameter.
2025-09-04
15 hrshrs.
Karine Bertin. Universidad de Valparaíso
Estimación de la densidad en dominio complicados
Auditorio Ninoslav Bralic
Abstract:

Presentaremos distintos métodos de estimación no-paramétrica para la densidad con soporte compacto o dominio con geometría que puede ser compleja. Veremos como se fue afectado la calidad de estimadores clásicos al borde del dominio. En particular se propondrá un nuevo estimador basados en polinomios locales para estimar la densidad en un punto x. Veremos que este estimador puede adaptarse a distintas geometrías y que tiene propiedades optímales en términos del error cuadrático medio y de la regularidad de la función a estimar. Compararemos este método al método sparr que es una alternativa popular para la estimación de densidad en dominios a geometría complicada.

 
2025-08-06
3 pmhrs.
Catalina Garcia García. Universidad de Granada
Algunas líneas de modelamiento estadístico: De la modelización de emisiones de CO? al tratamiento de la multicolinealidad. Retos futuros
Auditorio Ninoslav Bralic
Abstract:
Se presenta dos líneas de investigación actuales relacionadas con modelamiento estadístico así como propuestas de trabajo para generar posibles sinergias y colaboraciones.
En primer lugar, se presenta la investigación en relación con la modelización de las emisiones de dióxido de carbono (CO?), evaluando seis distribuciones candidatas para identificar un modelo de dos parámetros capaz de describir adecuadamente la distribución completa de las emisiones fósiles y realizar predicciones y recomendaciones de políticas públicas. 
En segundo lugar, se presenta las contribuciones en relación con el diagnóstico y el tratamiento de la multicolinealidad. Como propuesta de investigación futura, se plantea el análisis econométrico de las emisiones de CO? combinado el conocimiento sobre su distribución subyacente con el tratamiento riguroso de la multicolinealidad.
2025-07-21
15:00hrs.
Dr. Marcos Prates. Universidade Federal de Minas Gerais
Advances in Spatial Statistics for Large-Scale and Complex Domains
Auditorio Ninoslav Bralic
Abstract:
The proliferation of large-scale geospatial data from sources such as satellite remote sensing and cellular phone networks has created a need for new statistical methods capable of handling massive datasets and complex spatial domains, as classical techniques often face prohibitive computational burdens and restrictive assumptions. In this talk, I discuss recent advances that directly address some of these challenges, primarily through the development of a scalable model that reduces computational complexity from cubic to near-linear in the number of observations. Further, we explore some of its applications. Beyond scalability, progress has been made in tailoring methods for complex domains by defining a process using appropriate distance metrics. The synthesis of these scalable and geometrically aware methods empowers practitioners to extract meaningful insights from vast and intricate spatial data. Again, we revisit applications in other spatial domains. FAPEMIG and CNPq partially funded these works.

This is a joint work with Carlos Gonzáles, Dipak K. Dey, Harvard Rue, Heitor Ramos, Lucas Godoy, Lucas Michelin, Jun Yan and Zaida Quiroz
2025-06-27
11,00 amhrs.
Paulo Henrique Ferreira. Universidade Federal de Bahia, Brasil
Reliability analysis of multiple repairable systems under imperfect repair and unobserved heterogeneity
Auditorio Ninoslav Bralic,
Abstract:

Imperfect repairs (IRs) are widely applicable in reliability engineering since most equipment is not completely replaced after failure. In this sense, it is necessary to develop methodologies that can describe failure processes and predict the reliability of systems under this type of repair. One of the challenges in this context is to establish reliability models for multiple repairable systems considering unobserved heterogeneity associated with  systems failure times and their failure intensity after performing IRs. Thus, in this work, frailty models are proposed to identify unobserved heterogeneity in these failure processes. In this context, we consider the arithmetic reduction of age (ARA) and arithmetic reduction of intensity (ARI) classes of IR models, with constant repair efficiency and a Power-Law Process (PLP) distribution to model failure times and a univariate Gamma distributed frailty by all systems failure times. Classical inferential methods are used to estimate the parameters and reliability predictors of systems under IRs. An extensive simulation study is carried out under different scenarios to investigate the suitability of the models and the asymptotic consistency and efficiency properties of the maximum likelihood estimators. Finally, we illustrate the practical relevance of the proposed models on a real data set of sugarcane harvesters.

 

Joint work with: Éder S. Brito, Vera L. D. Tomazella, Paulo H. Ferreira, Francisco Louzada Neto, Oilson A. Gonzatto Junior.

2025-06-06
15:00hrs.
Kerlyns Martínez. Universidad de Concepción
Modelización estocástica de especies con estructura de edad bajo la influencia del comportamiento de pescadores
Auditorio Bralic
Abstract:
En esta charla abordaremos el desarrollo y análisis de un modelo matemático para la población de Kelp, que incorpora aspectos tanto ecológicos como sociológicos, considerando en particular la respuesta de los pescadores frente a regulaciones ambientales. Comenzaremos con una derivación heurística del modelo, incluyendo la representación de la incertidumbre inherente a los sistemas abiertos. A continuación, mostraremos la existencia y unicidad de soluciones dentro del espacio de soluciones admisibles, así como un análisis asintótico de la biomasa total. También introduciremos un esquema numérico eficiente que preserva las propiedades esenciales del modelo en presencia de coeficientes de crecimiento superlineal y no Lipschitz, y presentaremos simulaciones que ilustran distintos escenarios de interacción humana y dinámica del Kelp.
2025-05-29
15:00hrs.
Natalia Da Silva. Universidad de la República, Uruguay
Potenciando datos de uso de plataformas educativas mediante aprendizaje estadístico Bayesiano
Auditorio Bralic
Abstract:

El uso de distintos Sistemas de Gestión de Aprendizaje o plataformas educativas se ha convertido en una herramienta clave en el ámbito educativo. Estos sistemas generan diariamente un enorme volumen de datos tanto de estudiantes como de docentes. Transformar estos datos en información relevante para la toma de decisiones representa un gran desafío, debido a la complejidad de su estructura y a la dificultad de resumir el proceso de aprendizaje a partir de los registros disponibles.

En este trabajo se presentan métodos para transformar los datos de plataformas educativas en información relevante y explorar cómo esta puede utilizarse para predecir el desempeño académico en educación primaria pública en Uruguay. Se aplican métodos de aprendizaje estadístico Bayesianos para predecir rendimiento académico a partir de los patrones de uso de la plataforma Little Bridge, así como variables sociodemográficas y datos a nivel institucional. Específicamente, se utiliza el modelo BART (Bayesian Additive Regression Trees) y se compara su desempeño predictivo con Random Forest.  El enfoque Bayesiano es seleccionado debido a la capacidad de incorporar efectos aleatorios a nivel de escuela, lo cual permite analizar los procesos de aprendizaje en múltiples niveles.

Los resultados pueden aplicarse tanto a nivel individual —para la identificación temprana de estudiantes en riesgo— como a nivel institucional, para destacar centros educativos que requieren intervención o aquellos que pueden servir como modelos de éxito.

 

2025-05-14
15:00hrs.
Cristian Meza. Universidad de Valparaíso
Estimation procedure based on Stochastic EM algorithm in Zero-Inflated mixed effects models applied to microbiome data
Auditorio Bralic
Abstract:
Human microbiome studies based on genetic sequencing techniques produce compositional (or count) longitudinal data of the relative (or absolute) abundances of microbial taxa over time, allowing to understand, through mixed-effects modeling, how microbial communities evolve in response to clinical interventions, environmental changes, or disease progression. In particular, the Zero-Inflated (ZI) models fit jointly and over time the presence and abundance of each microbe taxon, considering the compositional nature of the data, its skewness, and the over-abundance of zeros. However, as for other complex random effects models, maximum likelihood estimation suffers from the intractability of likelihood integrals. Available estimation methods rely on log-likelihood approximation, which is prone to potential limitations such as biased estimates or unstable convergence. In this work we develop an alternative maximum likelihood estimation approach for the ZI models such as the Beta Regression or Beta-Binomial, based on the Stochastic Approximation Expectation Maximization (SAEM) algorithm. The proposed methodology allows to model unbalanced data, which is not always possible in existing approaches. We also provide estimations of the standard errors and the log-likelihood of the fitted model. The performance of the algorithm is established through simulation, and its use is demonstrated in microbiome studies, showing its ability to detect changes in both presence and abundance of bacterial taxa over time and in response to treatment.
2025-04-29
15:00hrs.
Ronny Vallejos. Universidad Técnica Federico Santa Maria
Advances in Agreement Coefficients for Continuous Measurements
Sala usos multiples, Felipe Villanueva
Abstract:

Assessing agreement between instruments is fundamental in clinical and observational studies to evaluate how similarly two methods measure the same set of subjects. In this talk, we present two extensions of a widely used coefficient for assessing agreement between continuous variables. The first extension introduces a novel agreement coefficient for lattice sequences observed over the same areal units, motivated by the comparison of poverty measurement methodologies in Chile. The second extension proposes a new coefficient, denoted as ρ1, designed to measure agreement between continuous measurements obtained from two instruments observing the same experimental units. Unlike traditional approaches, ρis based on Ldistances, providing robustness to outliers and avoiding dependence on nuisance parameters. Both proposals are supported by theoretical results, an inference framework, and simulation studies that illustrate their performance and practical relevance.

2025-04-10
16:00hrs.
Francisco Cuevas. Universidad Técnica Federico Santa María
Composite likelihood inference for space-time point processes
Sala 1 multiuso, 1° Piso Felipe Villanueva
Abstract:

The dynamics of a rain forest is extremely complex involving births, deaths and growth

of trees with complex interactions between trees, animals, climate, and environment. We

consider the patterns of recruits (new trees) and dead trees between rain forest censuses.

For a current census we specify regression models for the conditional intensity of recruits

and the conditional probabilities of death given the current trees and spatial covariates. We

estimate regression parameters using conditional composite likelihood functions that only

involve the conditional first order properties of the data. When constructing assumption

lean estimators of covariance matrices of parameter estimates we only need mild assumptions

of decaying conditional correlations in space while assumptions regarding correlations over

time are avoided by exploiting conditional centering of composite likelihood score functions.

Time series of point patterns from rain forest censuses are quite short while each point

pattern covers a fairly big spatial region. To obtain asymptotic results we therefore use a

central limit theorem for the fixed timespan - increasing spatial domain asymptotic setting.

This also allows us to handle the challenge of using stochastic covariates constructed from

past point patterns. Conveniently, it suffices to impose weak dependence assumptions on

the innovations of the space-time process. We investigate the proposed methodology by

simulation studies and an application to rain forest data.

2025-03-07
15:00hrs.
Victor Morales-Oñate. Universidad de Las Américas, Quito Ecuador.
Machine Learning en Modelos de Riesgo de Crédito
Salas multiuso, 1° piso Villanueva
Abstract:
La modelización del riesgo de crédito ofrece un campo de oportunidades tanto para profesionales con formación estadística tradicional como para aquellos especializados en Machine Learning. Sin embargo, la elección entre métodos clásicos y enfoques basados en aprendizaje automático no es trivial. ¿Cuándo y por qué optar por una técnica sobre otra?
 
En esta charla, exploraremos esta pregunta clave a través del ciclo de vida del crédito, analizando cómo el Machine Learning está transformando la evaluación y gestión del riesgo. Compararemos los enfoques tradicionales con modelos avanzados, resaltando sus ventajas, limitaciones y los desafíos que implica su implementación en un entorno regulado.
 
Finalmente, discutiremos casos de aplicación de Analítica Avanzada en la industria financiera, identificando oportunidades de innovación y el impacto de estas metodologías en la toma de decisiones estratégicas.
2024-11-26
13:30hrs.
Víctor H. Lachos. University of Connecticut
An EM algorithm for fitting matrix-variate normal distributions on interval-censored and missing data.
Auditorio Ninoslav Bralic
Abstract:

Matrix-variate distributions are powerful tools for modeling three-way datasets that often arise in longitudinal and multidimensional spatio-temporal studies. However, observations in these datasets can be missing or subject to some detec- tion limits because of the restriction of the experimental apparatus. Here, we develop an efficient EM-type algorithm for maximum likelihood estimation of parameters, in the context of interval-censored and/or missing data, utilizing the matrix-variate normal distribution. This algorithm provides closed-form expres- sions that rely on truncated moments, offering a reliable approach to parameter estimation under these conditions. Results obtained from the analysis of both simulated data and real case studies concerning water quality monitoring are reported to demonstrate the effectiveness of the proposed method.

2024-11-20
16:00 horashrs.
Debajyoti Sinha. Florida State University
Analysis of spatially clustered survival data with unobserved covariates using SBART
sala 2 de usos múltiples, 1er. piso Edificio Felipe Villanueva
Abstract:

For large, clustered survival studies, usual parametric and semi-parametric regression are inappropriate and inadequate when the appropriate functional forms of the covariates and their interactions in hazard functions are unknown, and random cluster effects as well as some unknown cluster-level covariates are spatially correlated. We present a general nonparametric method for such studies under the Bayesian ensemble learning paradigm called Soft Bayesian Additive Regression Trees (SBART in short).
Our additional methodological and computational challenges include large number of clusters,  variable cluster sizes, and  proper statistical augmentation of the unobservable cluster-level covariate using a data registry different from the main  survival study. 
We use an innovative 3-step computational tool based on latent variables to address our computational challenges. Using two different data resources, we illustrate the practical implementation of our method and its advantages over existing methods by assessing the impacts of intervention in some cluster/county level and patient-level covariates to mitigate existing disparity in breast cancer survival in 67 Florida counties (clusters) .  Florida Cancer Registry (FCR) is used to obtain clustered survival data with patient-level covariates, and the Behavioral Risk Factor Surveillance Survey (BRFSS) is used as to obtain further data information on an unobservable county-level covariate of Screening Mammography Utilization (SMU).

2024-11-08
15:00hrs.
Marie-Hélène Descary. Université Du Québec À Montréal
Constructing Ancestral Recombination Graphs through Reinforcement Learning
sala de usos múltiples, 1er. piso Edificio Felipe Villanueva
Abstract:
Over the years, many approaches have been proposed to build ancestral recombination graphs (ARGs), graphs used to represent the genetic relationship between individuals. Among these methods, many rely on the assumption that the most likely graph is among the shortest ones. In this talk, I will present a new approach to build short ARGs: Reinforcement Learning (RL). Our method exploits the similarities between finding the shortest path between a set of genetic sequences and their most recent common ancestor and finding the shortest path between the entrance and exit of a maze, a classic RL problem. In the maze problem, the learner, called the agent, must learn the directions to take in order to escape as quickly as possible, whereas in our problem, the agent must learn the actions to take between coalescence, mutation, and recombination in order to reach the most recent common ancestor as quickly as possible. Our results show that RL can be used to build ARGs as short as those built with a heuristic algorithm optimized to build short ARGs, and sometimes even shorter. Moreover, our method allows to build a distribution of short ARGs for a given sample, and can also generalize learning to new samples not used during the learning process.
2024-09-25
15:00 horashrs.
Jorge Loria. Department of Computer Science Aalto University
Aprendizaje posterior de kernels bajo previas con peso infinito
sala de usos múltiples, 1er. piso Edificio Felipe Villanueva
Abstract:

Neal (1996) demostró que las redes neuronales Bayesianas (BNN) de una capa infinitamente anchas convergen a un proceso Gaussiano (GP), cuando los pesos tienen una previa de varianza finita. Cho & Saul (2009) presentaron una fórmula recursiva para procesos de kernel profundos, relacionando la matriz de covarianza de una capa con la matriz de covarianza de la capa anterior. Más aún, obtuvieron una fórmula explícita para la recursión con algunas funciones de activación comunes, incluyendo la ReLU. Trabajos posteriores han fortalecido estos resultados a arquitecturas más complejas, obteniendo límites similares para redes más profundas. A pesar de esto, trabajos recientes, incluyendo Aitchison et al. (2021), destacan como los kernels de covarianza obtenidos de esta forma son determinísticos y así imposibilitan el aprendizaje de las represenciones de la red límite, lo cual equivale a aprender un kernel posterior que sea no-degenerado dadas las observaciones. Para abordar esto proponen añadir un ruido artifical y así que el kernel retenga estocasticidad. Sin embargo, este ruido artifical puede criticarse pues no emerge del límite de la arquitectura de una BNN. Buscando evitar esto, demostramos que una red neuronal Bayesiana profunda, donde la anchura de cada capa va a infinito, y todos los pesos tienen distribución conjunta elíptica con varianza infinita, convergen a un proceso con marginales α-estable en cada capa que tengan una representación condicionalmente Gaussiana. Estas covarianzas aleatorias pueden relacionarse recursivamente en la manera de Cho & Saul (2009), a pesar de que los procesos tengan comportamiento estable, y por tanto las covarianzas no están necesariamente definidas. Nuestros resultados proveen generalizaciones a nuestro trabajo previo de Loría & Bhadra (2024) en redes de una capa, a redes de múltiples capas y evitando la intensa carga computacional. Las ventajas computacionales y estadísticas resaltan sobre otros métodos en simulaciones y en bases de datos de referencia.

2024-09-06
09:40hrs.
Hector Araya. Universidad Adolfo Ibañez
Least squares estimation for the Ornstein-Uhlenbeck process with small Hermite noise and some generalizations
Auditorio Ninoslav Bralic
Abstract:
We consider the problem of the drift parameter estimation for a non-Gaussian long memory Ornstein–Uhlenbeck process driven by a Hermite process. To estimate the unknown parameter, discrete time high-frequency observations at regularly spaced time points and the least squares estimation method are used. By means of techniques based on Wiener chaos and multiple stochastic integrals, the consistency and the limit distribution of the least squares estimator of the drift parameter have been established. To show the computational implementation of the obtained results, different simulation examples are given. Finally, an extension to a type of  iterated Ornstein–Uhlenbeck is discussed.