I recently attended the Vine Copula Workshop hosted at TUM Munich, were I presented our work (ArXiv here) on using vine copula models of input dependencies to improve UQ estimation. I came back positively impressed by the breakthroughs achieved by colleagues in the copula field. A good deal of their research has direct practical consequences on copula modeling for UQ problems. I thought to write this post to share some impressions, and to keep some names we may risk to overlook under our radar.
(If you wonder in which sense dependencies are relevant to UQ problems, and what copulas are or have to do with that, you may be interested in this previous post of mine).
New / Refined Copula models relevant to UQ problems
A number of participants to the workshop presented interesting novel, or improved, copula models of dependence, as well as algorithms to obtain more parsimonious models, thereby reduce overfitting. Below I summarize some of these models and methods.
Parsimonious vine copula models
Vine copulas consist of the tensor product of simpler pair copulas. Each pair copula models the dependence between two features, possibly conditioned on the values takes by other random variables. For instance, the trivariate copula of (X_1,X_2,X_3) can be modelled as
C_{123}(u_1,u_2,u_3)=C_{12}(u_1,u_2) \cdot C_{23} (u_2,u_3) \cdot C_{13|2} (u_{1|2}, u_{3|2}),
where C_{ij|k} is the copula between X_i and X_j given X_k.
In applications, each par copula is practically assigned based on inference from available data. Typically, this is done by greedy algorithms that:
- sort the input variables from those most strongly correlated with all the others (according to a chosen measure of dependence) to those with the weakest correlations
- model the unconditional pair copulas C_{ij} involved in the construction
- transform the original observations into conditional ones, e.g., u_{i|j}, u_{k|j}, …
- infer conditional copulas C_{ik|j} of the first order on pairs (u_{i|j}, u_{k|j}) of conditional observations
- proceed to conditioning of higher order, if needed.
Conditional algorithms may return inferred vine copulas with poor goodness of fit if the data are high dimensional (several tens of input variables). Harry Joe presented a novel non-greedy algorithm based on truncated vine copulas with latent variables. He compared the analysis with another common approach for high dimensional problems, namely using factor analysis to split the input variables into groups with high intra-group dependence and low inter-group dependence. The results look very promising, highlighting yet another step forward toward finding optimal solutions for this challenging problem.
Copula models for mixed continuous-discrete data
While UQ problems mainly feature continuous variables, exceptions alyways exist. The ongoing work on modelling discrete variables and their dependencies may be worth attention! At the workshop, vine copulas for mixed continuous-discrete input variables have been addressed by a number of researchers. Among them, I noted the names of Lu Yang [1] and S. Kadhem.
Archimedean copulas for tail dependencies and reliability analysis
O. Okhrin presented his work with J. Gorecki and M. Hofert on modelling tail dependencies by an extension of Hierarchical Archimedean copulas. Tail dependencies have immediate impact on reliability and fragility analysis: jointly extreme inputs are often the cause of an extreme response. Thus, positive probabilities of joint extremes in the inputs (often neglected by UQ practicioners) significantly increase failure probabilities and fragility indices. Copula models designed to model such relationships (and algorithms able to discover it in data) are very welcome.
The case of missing data
P. Song presented an improved method to fit Gaussian copula models for problems with missing data [2]. Extensions to non-Gaussian copula models are being investigated as well.
Shapley sensitivity indices for correlated features
Sensitivity analysis aims to explain which input variable(s) to a system (in machine learning terminology, which features) are the strongest drivers of the uncertainty in the system’s response.
Different global and local sensitivity indices have been defined in the literature to quantify such relations. Historically, the common denominator to all these definitions have been an assumption of statistical independence among the inputs. The independence assumption allows one to separate the total variance of the output into sums of individual contributions from each input.
For dependent inputs, there is no uniquely accepted definition of sensitivity of the output to a specific input: since the latter is correlated to other inputs, asking for “its isolated contribution” to the total variability may be posing a meaningless question. Nevertheless, alternative definitions of sensitivity have been proposed to solve this apparent nonsense. Kjersti Aas and colleagues have specifically worked on Shapley values, and proposed algorithms for their computation in the cases when inputs are coupled by a copula model.
Software for vine copula models
UQLab offers an integrated environment to define and infer probabilistic input models represented in terms of marginal and copula distributions, and combine those with UQ analyses. Supported copulas are the independence and Gaussian copulas and, with the next planned release, the rich class of canonical and drawable vine models.
However, these may not be sufficient for particularly complex and high dimensional problems. Dedicated software for copula representation and inference exist as an alternative.
The group led by Claudia Czado at TUM Munich freshly released a powerful C++ toolbox to model regular vines. The software features a number of estimation algorithms as well as an efficient automatic parallelization to speed up calculations considerably. It can be downloaded here, and also features an R wrapper.
In parallel, Ian Gorecki and colleagues released an extensive Matlab package for hierarchical Archimedean copulas. The package can be downloaded here.
References:
[1] https://www.tandfonline.com/doi/abs/10.1080/01621459.2017.1330692
[2] https://www.researchgate.net/publication/294725341_EM_algorithm_in_Gaussian_copula_with_missing_data