Hello all,
I have a model with 44 input parameters and I have about 2700 FEM simulations. However, for some of my outputs (I have different outputs based for different frequencies), my PCE leave-one-out error is about 20%. I am creating PCE up to degree of 4 and I cannot go beyond that because of computational cost.
Do you think increasing the number of simulations will help? If yes, how much should I increase it?
Do you think reducing the coefficient of variation in the input parameters can be helpful?
Do you think there is any benefit in using PCK rather than PCE?
Do you know any methods to accurately determine unimportant parameters to remove them and reduce the dimension before creating PCE?
Any other suggestions?
Unfortunately, I cannot publicly provide the data.
Your number of simulations (N=2700) for d=44 dimensions is not so low, so you should be able to build a good polynomial chaos expansion surrogate model. I’m not sure you used the right options though, especially in terms of sparse PCE:
you should use ‘LARS’ as solver
you can specify a range of maximal degrees, say 3:10 instead of a single one: LARS will test all of them and keep the best.
for this high dimension, you should use a q-norm truncation scheme, which allows you to select a much smaller candidate basis before LARS is run.
I suggest something like this:
You can also limit the interactions terms in your polynomial basis (somehow similar effect as a low q=0.3 or so, depending on the dimension) wit MetaOpts.TruncOptions.MaxInteraction = 2;. In most cases there are only low interaction, but you need much higher univariate degrees to get a good accuracy.
From the obtained PCE, you can easily get the Sobol indices:
SobolOpts.Type = 'Sensitivity';
SobolOpts.Method = 'Sobol';
SobolFromPCE = uq_createAnalysis(SobolOpts);
% if you put this right after creating the PCE, the latter is used
% to compute Sobol' indices analytically.
I hope it will solve your problem. Regarding the data: the nice thing is that you could post anonymized data easily: what is needed a N \times d array of input parameters and the corresponding outputs. We don’t need to know what the parameters stand for (call them X_1, \dots , X_{44}). If the inputs are uniform distributions, they could be easily normalized to [0,1]. This way you can get targeted help from the community !
Thank you very much @bsudret for your detailed explanations. I want to say that I am still working on this and I will update you when I come to a conclusion.
Thanks and regards