Computational cost of PCE methods

As someone not well versed in the PCE metamodelling theory, I find myself stumbling upon various issues when trying to choose the suitable options for my PCE regression. Specifically, I am trying to reduce the computational cost of developing the PCE while still retaining an acceptable LOO error. The complexity of the resulting PCE model does not concern me, since I plan to use it for the sole purpose of calculating Sobol’ Indices.
Under these circumstances, my understanding is that the faster way would be to choose a sufficiently large PCE degree and just use Ordinary Least-Squares regression without basis adaptivity or any form of truncation (provided that there is a relatively large amount of inputs and experimental design points).

I would really like to hear your thoughts on the issue. Any additional ideas regarding the relative computational cost of PCE methods (and the factors upon that depends) are also extremely welcome.

Hi Chris,

Could you clarify what you mean with “complexity of the resulting PCE”? And with “relative computational cost of PCE methods”? Typically, the main cost in PCE calculations is the number N of model evaluations.

Generally, it is not a good idea to use OLS instead of sparse regression. See the PCE user manual, section 2.6.2 and 2.6.3, or try it out yourself (use a cheap toy model and compute a large validation set, see PCE manual, section 2.8): for the same number of ED points, you’ll get a much smaller error with sparse regression than with OLS.

The reason is that OLS requires that you have at least as many experimental design (ED) points as basis functions in the expansion to avoid ill-conditioning of the regression matrix (N \geq P), and even more points to avoid overfitting. However, for a typical problem, there will be many basis elements in the total-degree basis (even with hyperbolic truncation) that are not needed, i.e., have a near-zero coefficient, see the illustrations in the PCE manual, Figures 7 and 8. So you are wasting many points on computing coefficients for terms that are not even needed.

Sparse regression methods, on the other hand, can deal with N < P. They detect which terms are important, and assign 0 to the remaining coefficients, making better use of the given data. Of course, this only works if the model can actually approximated well by a sparse PCE. This is known as the sparsity-of-effects principle and holds surprisingly often for real-world models. You can imagine that there might be inputs that hardly play a role, or that several of the inputs do not interact at all, which eliminates quite a lot of elements from the total-degree basis (but note that you cannot know that beforehand! Sparse regression detects this automatically, without you needing to provide this information).

Degree and q-norm adaptivity (PCE manual, section 1.3.4) is a further step towards the automatic detection: you want to have a basis that is large enough to include all important terms, but not unnecessarily large (since also sparse regression works better the larger the ratio \frac{N}{P}). The goal is always to minimize the generalization error (PCE manual, equations (1.16) and (1.17)), but since you normally don’t have a separate validation set to compute it, you use the LOO cross-validation error (on the training set = the ED) as approximation to guide your choice of best degree and q-norm.

As a side note, if you want to read more about different methods for computing PCE, I can recommend our recent review and benchmark work about solvers and sampling schemes for sparse PCE and basis-adaptive sparse PCE :wink:

Hi Nora, and thank you for the elaborate reply

When referring to “PCE complexity” I meant PCE models with high degree terms and no sparsity, which is not that important as I have come to realize. As for the “relative computational cost of PCE methods” I am referring solely to the time taken in developing the model, assuming that an experimental design is already available.

Anyway, your answer showed me that I had totally misunderstood the situation when it comes to introducing sparsity. My understanding was that introducing sparsity would increase the computational cost due to the extra steps required in identifying and eliminating near-zero elements, with the advantage of a much “cleaner” final PCE model. I realize now that sparsity provides great computational time savings due to the linear algebra involved in the calculation of the coefficients.

Continuing, I can see how degree and q-norm adaptivity help in minimizing the (generalization or LOO) error but I understand that it comes at the expense of computational time, since all possible PCE configurations are computed and the one with the lowest error is chosen. If I am not wrong until this point, I think that choosing a suitable value for metaOpts.<method>.TargetedAccuracy can prevent the computation of PCE models above the degree that satisfies that accuracy, therefore reducing the total computational cost.

I will also make sure to check out your work for a more in-depth view of the subject. Thanks again!