Acceptable LOO Error

bsudret · June 22, 2020, 8:46am

Indeed we recommend an experimental design of size N_{ED} \approx 10 \times d, that is 10 times the number of input parameters, when using sparse PCE or Kriging. If d is large (say 50 to a few hundreds) you can start with a smaller ratio. For instance we have examples with good results using N_{ED}=200 and d=78 ^[1]. This happens because, in most problems with large d, there are only a few important variables.

Regarding the validation set, that’s the beauty of PCE and Kriging: you don’t need additional validation runs when using the trick of leave-one-out (LOO) cross validation (see details in the UQLab user manual - Polynomial chaos expansions page 9). Thus we don’t use independent validation sets if computational cost is an issue.

If you still want to make sure that LOO cross-validation properly works (often, people have the impression that this is magic and not reliable at first sight …), then you can of course use a validation set, and the size is your choice. It depends how much computational time you want to devote.

When we use toy analytical examples to demonstrate the efficiency of a new method, we use 10^6 validation points.
If the FEM model is costly, 1,000 or less is already OK.

If you start from an existing data set, the common practice in machine learning is 80% for model building (training) and 20% for validation, yet you repeat the analysis by choosing randomly the ED- and the validation points (typically 10 times). As I said, with PCE and Kriging you don’t need to split and can use 100% of the existing points for building the PCE.

Best regards
Bruno

Deman, G., Konakli, K., Sudret, B., Kerrou, J., Perrochet, P. & Benabderrahmane, H. (2016) Using sparse polynomial chaos expansions for the global sensitivity analysis of groundwater lifetime expectancy in a multi-layered hydrogeological model, Reliab. Eng. Sys. Safety,147, 156-169. ↩︎