Presentaremos distintos métodos de estimación no-paramétrica para la densidad con soporte compacto o dominio con geometría que puede ser compleja. Veremos como se fue afectado la calidad de estimadores clásicos al borde del dominio. En particular se propondrá un nuevo estimador basados en polinomios locales para estimar la densidad en un punto x. Veremos que este estimador puede adaptarse a distintas geometrías y que tiene propiedades optímales en términos del error cuadrático medio y de la regularidad de la función a estimar. Compararemos este método al método sparr que es una alternativa popular para la estimación de densidad en dominios a geometría complicada.
Traditional factor analysis, which relies on the assumption of multivariate normality, has been extended by jointly incorporating the restricted multivariate skew-t (rMST) distribution for the unobserved factors and errors. However, the limited utility of the rMST distribution in capturing skewness concentrated in a single direction prompted the development of a more adaptable and robust factor analysis model. A more flexible, robust factor analysis model is introduced based on the broader canonical fundamental skew-t (CFUST) distribution, called the CFUSTFA model. The proposed new model can account for more complex features of skewness in multiple directions. An efficient alternating expectation conditional maximization algorithm fabricated under several reduced complete-data spaces is developed to estimate parameters under the maximum likelihood (ML) perspective. To assess the variability of parameter estimates, an information-based approach is employed to approximate the asymptotic covariance matrix of the ML estimators. The efficacy and practicality of the proposed techniques are demonstrated through the analysis of simulated and real datasets.
Keywords: AECM algorithm; Canonical fundamental skew-t distribution; Factor scores; Truncated multivariate t distribution; Unrestricted multivariate skew-t distribution
The multivariate contaminated normal (MCN) distribution which contains two extra parameters with respect to parameters of the multivariate normal distribution, one for controlling the proportion of mild outliers and the other for specifying the degree of contamination, has been widely applied in robust statistical modeling of the data. This paper extends the MCN model to deal with possibly censored values due to limits of quantification, referred to as the MCN with censoring (MCN-C) model. Further, it establishes the censored multivariate linear regression model where the random errors have the MCN distribution, named the MCN censored regression (MCN-CR) model. Two computationally feasible expectation conditional maximization (ECM) algorithms are developed for maximum likelihood estimation of the MCN-C and MCN-CR models. An information-based method is used to approximate the standard errors of location parameters and regression coefficients. The capability and superiority of the proposed models are illustrated by two real-data examples and simulation studies.
Keywords: Censored data; EM algorithm; Multivariate models; Outliers; Truncation.
The Gaussian copula is a powerful tool that has been widely used to model spatial and/or temporal correlated data with arbitrary marginal distributions. However, this model can be restrictive as it expresses a reflection symmetric dependence.
Recently, (Bevilacqua et al , 2024) proposed a new general class of spatial cop- ula models that allows the generation of random fields with arbitrary marginal distributions and types of dependence that can be reflection symmetric or not, par- ticularly focusing on an instance that can be seen as the spatial generalization of the Classical Clayton copula. In this session, we will review this general class of Archimedean-like spatial copulas and explore the various spatial extensions that this construction allows. Specifically, the Clayton-like case will be examined along with two spatial copulas currently in development: the Ali-Mikhail-Haq and Gum- bel spatial copulas. Additionally, we will present the ongoing development of an application of this methodology to model geo-referenced operational covariates us- ing Weibull regression, which can be seen as the spatial extension of the widely known proportional hazard model.
References
Bevilacqua, M., Alvarado, E. & Caaman?o-Carrillo, C. A flexible Clayton-like spa- tial copula with application to bounded support data. Journal Of Multivariate Analysis. 201 pp. 105277 (2024,5)
Los modelos de mezcla, especialmente las mezclas de Proceso de Dirichlet, se utilizan ampliamente en análisis de clusters Bayesiano. La Matriz de Similitud a Posteriori (PSM por su sigla en inglés) es crucial para comprender la estructura de clusters de los datos, y típicamente se estima con métodos de Monte Carlo basados en Cadenas de Markov (MCMC). Sin embargo, en este contexto MCMC puede ser muy sensible con respecto a la inicialización de las cadenas, y la convergencia suele ser lenta, visitando un número muy reducido de particiones de los datos. Esto resulta en una versión restringida de la posteriori, que puede afectar negativamente tanto la estimación de la PSM, como la de los clusters.
Este trabajo propone un método más eficiente para la estimación de la PSM, sin el uso de MCMC. Basado en una fórmula analítica, se busca aproximar directamente las entradas de la PSM, particularmente para las mezclas de Proceso de Dirichlet, reduciendo el costo computacional y mejorando la precisión de la estimación. En esta presentación mostraré distintos métodos de aproximación, con resultados preliminares obtenidos mediante simulaciones y datos reales, ilustrando ventajas con respecto a MCMC, así como también sus propios desafíos.
Structural equation models aim to represent and describe relationships between constructs, and between constructs and observed variables, whereas multiblock data analysis focuses on explaining the relationships between several blocks of variables. Multiblock data analysis enables the creation of latent variable scores and the estimation of structural equation models. A general framework is provided by Regularized Generalized Canonical Correlation Analysis (RGCCA). In this talk, I present application examples to illustrate a context for understanding the fundamental concepts of both fields and their interconnections. I review the main definitions related to RGCCA, the optimization problem, the search algorithm, and special cases. Further research is outlined.
We introduce two new approaches to clustering categorical and mixed data: Condorcet clustering with a fixed number of groups, denoted $$\alpha$$-Condorcet and Mixed-Condorcet respectively. As k-modes, this approach is essentially based on similarity and dissimilarity measures. The presentation is divided into three parts: first, we propose a new Condorcet criterion, with a fixed number of groups (to select cases into clusters). In the second part, we propose a heuristic algorithm to carry out the task. In the third part, we compare $$\alpha$$ -Condorcet clustering with k-modes clustering and Mixed-Condorcet with k-prototypes. The comparison is made with a quality’s index, accuracy of a measurement, and a within-cluster sum-of-squares index.
Our findings are illustrated using real datasets: the feline dataset, the US Census 1990 dataset and other data.