Hi Chris,
Could you clarify what you mean with “complexity of the resulting PCE”? And with “relative computational cost of PCE methods”? Typically, the main cost in PCE calculations is the number N of model evaluations.
Generally, it is not a good idea to use OLS instead of sparse regression. See the PCE user manual, section 2.6.2 and 2.6.3, or try it out yourself (use a cheap toy model and compute a large validation set, see PCE manual, section 2.8): for the same number of ED points, you’ll get a much smaller error with sparse regression than with OLS.
The reason is that OLS requires that you have at least as many experimental design (ED) points as basis functions in the expansion to avoid ill-conditioning of the regression matrix (N \geq P), and even more points to avoid overfitting. However, for a typical problem, there will be many basis elements in the total-degree basis (even with hyperbolic truncation) that are not needed, i.e., have a near-zero coefficient, see the illustrations in the PCE manual, Figures 7 and 8. So you are wasting many points on computing coefficients for terms that are not even needed.
Sparse regression methods, on the other hand, can deal with N < P. They detect which terms are important, and assign 0 to the remaining coefficients, making better use of the given data. Of course, this only works if the model can actually approximated well by a sparse PCE. This is known as the sparsity-of-effects principle and holds surprisingly often for real-world models. You can imagine that there might be inputs that hardly play a role, or that several of the inputs do not interact at all, which eliminates quite a lot of elements from the total-degree basis (but note that you cannot know that beforehand! Sparse regression detects this automatically, without you needing to provide this information).
Degree and q-norm adaptivity (PCE manual, section 1.3.4) is a further step towards the automatic detection: you want to have a basis that is large enough to include all important terms, but not unnecessarily large (since also sparse regression works better the larger the ratio \frac{N}{P}). The goal is always to minimize the generalization error (PCE manual, equations (1.16) and (1.17)), but since you normally don’t have a separate validation set to compute it, you use the LOO cross-validation error (on the training set = the ED) as approximation to guide your choice of best degree and q-norm.
As a side note, if you want to read more about different methods for computing PCE, I can recommend our recent review and benchmark work about solvers and sampling schemes for sparse PCE and basis-adaptive sparse PCE