Problem with input parameters for ceating PCE

Hello all,
I have input and output experimental data and I want to use “InputOpts.Inference.Data” to create my input in order to create PCE. However, I get many warnings like this one when I want to create input:
"Warning: Matrix is close to singular or badly scaled. Results may be inaccurate. RCOND = 3.658442e-21.
Warning: Cannot compute a cov matrix – the computed Hessian is not positive definite. "

Can you please help me to find out what is the problem. I have attached my input parameters as well. I have 32 parameters in my model with 13000 FEM experiments.
Parameters.zip (3.0 MB)
Also one more question: Some of my input parameters have exactly the same mean and standard deviation. Can I define them as separate input parameters and generate random values for each of them for creating PCE (like what I have done right now in the uploaded file)? Are these input parameters “dependent” because they have the same mean and standard deviation? If yes, what should I do in that regard?

I have lots of experimental points and still, I get a high LOO error for my PCE (I have tried different options of uqlab PCE creation). I am trying to find out the problem and I thought maybe the problem is related to my inputs. This is why I am asking the questions above. Any help is greatly appreciated.

Hi @Aep93,

The warnings just tell you that some of the distributions could not be fit (Gamma, logistic). They don’t fit well enough to the data to infer parameters. It’s nothing to worry about as long as there is no error.

I used your input data to infer the distributions like so:

iOpts.Inference.Data = X; 
myInput = uq_createInput(iOpts);

It worked, and returned an input object with all Gaussian marginals and an independence copula. This fits to what we see when plotting your data with e.g. plotmatrix.

Where do the 13000 input points come from, did you create them yourself or did someone give them to you? This can also help you figure out what distribution they should have.

Yes, or simply use the inferred input object.

No. Dependence has nothing to do with the parameters of the distributions, but with whether or not the realizations of two random variables are related. Mathematically, two random variables X, Y are independent if \mathbb{P}(X | Y) = \mathbb{P}(X), i.e., if knowing one of the variables does not give you any information about the value of the other one.

What is the LOO error that you currently achieve? Which input did you use for your PCE?

Sometimes the model is just not suited for PCE metamodelling. What is the quantity of interest that you are trying to approximate? E.g., QoIs like maximum displacement are often not well approximated by PCE.

2 Likes

Hello @nluethen. Thank you very much for your explanations and for working on my input file. I greatly appreciate your help. I created these input variables myself and I use them for FEM simulations. My output is the displacement at different frequencies.

I understood one thing:
All of my input parameters have a coefficient of variation of 10%. However:
For some frequencies, the coefficient of variation of the output displacement is small (about 10%) and for them, most of the times my PCE has LOO error of less that 1% which is acceptable for finding Sobol indices I think.

However, for some frequencies, the coefficient of variation of the output displacement is larger (20% for example) and for them, my PCE has errors up to about 15%.

Is my understanding correct? I mean is the accuracy of PCE affected by the coefficient of variation of the output?
If yes, do you have any suggestions so that I can get better PCE accuracy for the frequencies with high coefficient of variation?

Thanks in advance

Hi @Aep93,

First a remark on your previous question: since you created the input variables yourself, you should known their distribution exactly. In your first question, you said

Why did you want to use inference, if you know the distribution from which you sampled the input?


Now regarding your second question: No, the coefficient of variation is not directly related to the accuracy of PCE. E.g. a univariate linear model f(x) = x with input X\sim \mathcal{N}(0.1,100) has a huge coefficient of variation, but is very easy to approximate by PCE.

Of course, it is possible that for some frequencies, your model both has a higher CoV and becomes more nonlinear, i.e., more difficult to approximate by PCE. What kind of displacement do you measure, is it the pointwise, mean, or maximum displacement? What kind of structure are you working with? What settings do you use to compute the PCE?