Acceptable LOO Error

Dear UQLab

What is the acceptable LOO error in the field of structural engineering?
How about modified LOO error?
If I have some experimental data set from my FEM software or experimental tests

  1. What is the ROAD MAP to have a reliable surrogate model?
  2. Can I verify my results based on LOO error? How can I find acceptable LOO values?
    (I face a vast variety of LOO error values in UQ-Manuals)

Best regards

1 Like

Dear Ali

The leave-one-out (LOO) error \varepsilon_{LOO} is an estimate of the mean-square error between the original model \mathcal M and the surrogate model \hat{\mathcal M}, that is \varepsilon = \mathbb{E} \left[ \left( {\mathcal M}(\boldsymbol{x}) - \hat{\mathcal M}(\boldsymbol{x})\right)^2\right]. As such, it is an average over the parameters input domain.

When the surrogate is used for uncertainty quantification, i.e. when the input is considered as a random vector \boldsymbol{X} with probability density function f_{\boldsymbol{X} }, and when moments or sensitivity indices of the output are of interest, then good results may be obtained with a rather large LOO error.

Based on my experience I consider that \varepsilon_{LOO} =\le 10^{-2} is sufficient for this purpose, especially to compute Sobol’ sensitivity indices. This is especially true when using a polynomial chaos expansion (PCE) for sensitivity analysis (See the UQLab user manual on sensitivity analysis, Section 1.5.3). I even have examples where the screening of important parameters, leading to 2-digit accurate Sobol indices, is obtained with a PCE of (rather limited) accuracy \varepsilon_{LOO} \approx 0.05 - 0.1.

However, the above mean square error does not guarantee that, for any \boldsymbol x_0 in your input space, the pointwise error \left| {\mathcal M}(\boldsymbol{x_0}) - \hat{\mathcal M}(\boldsymbol{x_0})\right| is as small as \varepsilon. Although the mean-square and maximum pointwise error are somehow linked, there is no general results. If you want to properly estimate this maximal error, you need an independent validation set (what we tend to avoid in surrogate modelling, as each point usually results from a costly simulation). Getting a LOO error smaller than 10^{-4} is in general good enough to have pointwise errors less than 1%, but this is highly problem-dependent !

As a conclusion: yes, the LOO error is a good estimator of the quality of your surrogate, and the thresholds mentioned above can serve as guidelines. Our experience in structural mechanics is that the finite element models are usually rather smooth functions of the input parameters (e.g. they depend strictly linearly on the load parameters in the case of elastic analysis !) so that the threshold mentioned above are often easily achieved.

If you have a reasonable number of input parameters (say d=5-30), using an LHS experimental design (ED) of size N = 10 \times d\; is a good starting point. You can always enrich your experimental design later on (see the UQLab commands uq_enrichLHS or uq_lhsify if the accuracy of the surrogate obtained with this first ED is not sufficient.

Best regards


Thanks Prof. @bsudret for your kind reply

I have one misunderstanding about the last paragraph of your reply.
If I have a dataset (like truss examples in the metamodel) I must use my own data which could be derived from FEM software or experimental tests. Accordingly, I must split my data into two subgroups (like Boston housing example):1) ED and 2) validation. I have three questions:
1-Based on your recommendation, if we have a predefined dataset, can we assume a size of
N_ED=10Xd for our experimental design data?
2-What is our guideline about the number of validation data?
3- Is there any road map to split the data? In other words, as mentioned in manuals, Its recommended to generate data based on LHS to produce non-dependent outputs. How we can check this constraint in our model when we use a predefined dataset?
I really appreciate your time and efforts.

Best regards

Dear @ali

Indeed we recommend an experimental design of size N_{ED} \approx 10 \times d, that is 10 times the number of input parameters, when using sparse PCE or Kriging. If d is large (say 50 to a few hundreds) you can start with a smaller ratio. For instance we have examples with good results using N_{ED}=200 and d=78 [1]. This happens because, in most problems with large d, there are only a few important variables.

Regarding the validation set, that’s the beauty of PCE and Kriging: you don’t need additional validation runs when using the trick of leave-one-out (LOO) cross validation (see details in the UQLab user manual - Polynomial chaos expansions page 9). Thus we don’t use independent validation sets if computational cost is an issue.

If you still want to make sure that LOO cross-validation properly works (often, people have the impression that this is magic and not reliable at first sight …), then you can of course use a validation set, and the size is your choice. It depends how much computational time you want to devote.

  • When we use toy analytical examples to demonstrate the efficiency of a new method, we use 10^6 validation points.
  • If the FEM model is costly, 1,000 or less is already OK.

If you start from an existing data set, the common practice in machine learning is 80% for model building (training) and 20% for validation, yet you repeat the analysis by choosing randomly the ED- and the validation points (typically 10 times). As I said, with PCE and Kriging you don’t need to split and can use 100% of the existing points for building the PCE.

Best regards

  1. Deman, G., Konakli, K., Sudret, B., Kerrou, J., Perrochet, P. & Benabderrahmane, H. (2016) Using sparse polynomial chaos expansions for the global sensitivity analysis of groundwater lifetime expectancy in a multi-layered hydrogeological model, Reliab. Eng. Sys. Safety,147, 156-169. ↩︎


Dear Prof. @bsudret

Thank you very much for your valuable comments and precious suggestions.

Best regards