Input Discrete Data along with

Sunny123 · May 7, 2021, 1:29pm

How to input discrete random Variable.

I am doing Sobol sensitivity Analysis for a data set having four input Variable and one Output. Three Input Variable are continuous following a Log normal distribution. The Fourth Variable is a discrete Random Variable. following a uniform distribution. How can such input data set can be called into the uqlab.

xujia · October 1, 2021, 4:05pm

Hi @Sunny123,

Welcome to UQWorld, and thanks for creating this post.

So far, UQLab does not support discrete random variables, but we are working on it now, which will be included in a future version.

Nevertheless, if you have a computational model, you can still use the sampling-based method to calculate the Sobol’ indices (see Section 2.1.8 of the manual for more details). To this end, you should define your fourth random variable as a uniform random variable, and convert it to the target discrete random variable inside your model.

If you only have data, and the discrete variable has only a few possible outcomes, you can build a surrogate with the data that share the same outcome/label of the discrete variable. Then, you can follow the definition of Sobol’ indices to calculate the them. For example, suppose that your discrete variable X_4 is a Bernoulli random variable that can only take the value 0 or 1 with probability 0.5 for each. Denote that Y=f(X_1,X_2,X_3,X_4) your computational model. Then, you can build a surrogate model \hat{f}_0(X_1,X_2,X_3) with the data having X_4=0 and a second surrogate \hat{f}_1(X_1,X_2,X_3) for X_4=1. Then, following the definition, the first-order and total Sobol’ indices are given by

S_i = \frac{\mathbb{V}\left( \mathbb{E}\left(Y\mid X_i\right)\right)}{\mathbb{V}(Y)}, \quad S^T_i = 1-\frac{\mathbb{V}\left( \mathbb{E}\left(Y\mid X_{\sim i}\right)\right)}{\mathbb{V}(Y)}.

All the quantities in the equation above can be calculated by performing Monte-Carlo simulation on your two surrogate models: if a sample has X_4=0, \hat{f}_0 will be called. Otherwise, \hat{f}_1 will be used. In a more practical way, you can define a new model with X_4 following a uniform distribution between 0 and 1:

\hat{f}(x_1,x_2,x_3,x_4) = \begin{cases} &\hat{f}_0(x_1,x_2,x_3) \quad \text{if } 0\leq x_4<0.5\\ &\hat{f}_1(x_1,x_2,x_3) \quad \text{if } 0.5\leq x_4 \leq 1 \end{cases}.

Then, you can use samping-based method (see Section 2.1.8 of the manual for more details) to calculate the Sobol’ indices for \hat{f}(X_1,X_2,X_3,X_4), which would be estimates of the Sobol’ indices of the original model f.