Goodness-of-fit Tests

ali · June 21, 2020, 1:13pm

Dear UQLab

As you are well aware, after developing a metamodel it is essential to check the surrogate model by the goodness-of-fit tests.
Every reliable model shouldn’t suffer from the below problems:
1- Co-linearity
2-Heteroskedasticity
3-Non-normality of errors
4-Outliers

Please guide me on how to access goodness-of-fit tests especially the normal probability plot (Q-Q plot) in UQLab to avoid the above-mentioned problems.
Luckily, the Y(meta-model) vs Y(real) is well described in the metamodel examples.

Best regards

xujia · July 3, 2020, 8:55am

Dear @ali,

Thanks for posting this topic. I would like to share some ideas here.

When talking about co-linearity, I suppose that PCE is considered. In this case, the colinearity should be checked in the experimental design. If OLS is used, the colinearity can be easily verified by looking at the matrix \boldsymbol{\Psi}^T\boldsymbol{\Psi}: if it is invertible, no colinear effect is present.
In the conventional surrogate modeling framework, the computational model is considered as determinsitic. This implies that we do not have noise in the output in contrast to standard statistical setup. Hence, the heteroskedastic effect, non-normality, and outliers are not present. If you apply the surrogate methods to noisy data, you can calculate the residuals at the design points and use classical statistical analysis to assess the results. For example, using Tukey-Anscombe plot or statistical tests for heteroskedasticity, Q-Q plots (residuals against a normal distribution) or KS test for normality check, and cook’s distance for outlier detections.

As a conclusion, I think the colinearity is linked to the experimental design, and the other three items (heteroskedasticity, non-normality, and outliers) are closely related to the nature of the physical model (or the data generating process).

ali · July 4, 2020, 10:47am

Dear @xujia

Thanks for your kind reply. In particular, the fruitful references.
Here are my questions about your notes:
1- I was wondering why you limit the collinearity problem to PCE? Could you explain more, please? Besides, if I want to use the well-established “LAR” method whats the appropriate guideline?
2- Based on your suggestion, is there any way to implement the mentioned GOF tests in UQLab?

Best regards
Ali

xujia · July 4, 2020, 12:43pm

Dear @ali,

Thanks for the follow-up. Regarding the questions:

I think the term colinearity means that some regressors are colinear. Thus, colinearity can be a problem if we have a regression setup, namely using OLS, with clearly defined regressors. That’s why I refer to PCE which can be formed into this framework. In contrast, Gaussian processes do not really suffer from this problem when we do not have two identical design points in the experimental design. Regarding the least angle regression algorithm, it is a solver to produce a similar regularization path to the lasso path. In UQLab, a so-called hybrid LARS is implemented. For details, please have a look at the PCE manual.
As I mentioned before, the conventional surrogate setup does not contain noise, which is different from the statistical mindset. As a result, they are not available in UQLab. Nevertheless, since these methods mainly require post-processing the statistical model, I think you can implement them with only a few lines of code. Note that these tests and analyses mainly focus on understanding the data. Because a surrogate model is generally used for prediction, the leave-one-out error that, to some extent, estimates the generalization error is a good performance index to assess the model performance.

ali · July 5, 2020, 5:42pm

Dear @xujia
Thanks for your comprehensive discussion.
Best regards