Data-driven PCE and overfitting

Hi @Chemicaleng,

400 points in 5 dimensions should be enough for a reasonable fit, unless the model is very non-smooth and thus not suitable for PCE. But it seems that your model can be approximated quite well by PCE: it seems you achieve a relative MSE of 0.02 with 350 points. Do I understand correctly that you are using 350 points as ED and 50 points as validation set? Does Figure (1) display the Y-Y plot of the validation set?

Have you checked the mean of your ED? I would expect that mean(YED) \approx mean of the PCE. The moments of the PCE typically converge quite fast. Since your validation set is quite small, it might well be that its mean is different from the one of your ED. Especially when the Y-data varies a lot. Have a look at a histogram of all your 400 Y’s. Are there maybe a couple of outliers with very high values?

Regarding the rescaling you mention, I am not sure what you want to achieve with this. UQLab anyways does internally a transformation to standard variables. But if you insist on doing it, make sure you normalize only X, not Y, and change the input object accordingly.
I just realized you are probably using arbitrary PCE. So there is no input object, and UQLab does not do a transform. You can of course rescale both X and Y, but you will simply get a rescaled version of the PCE (as you also observed: it even has exactly the same LOO and validation error, but different values for the coefficients). It won’t change the fit.

Regarding your other questions:
(1) Overfitting is prevented by using LOO for model selection.

(2) Yes, the total degree of the basis is increased according to the range that you specify in MetaOpts.Degree. The final basis is chosen as the one with the lowest LOO. If you say the error is lowest at 4th degree, you probably mean validation error - the PCE fitting does not use validation error. (If you specify a validation set, it is only used to compute the final validation error.)

(3) See my above remark about normalization.

(4) This is very much dependent on the properties of your model. Have a look at @bsudret’s answers to the following questions:

Good luck, and let us know how you solved your problem! :slight_smile:

2 Likes