Surprising results with PCE for low number of samples

bsanderse · March 16, 2021, 10:08am

Hi all,

I’ve been doing some very simple tests on using PCE on a 1D sine function.
To my surprise, if I take only 2 or 3 samples, the PCE approximation does not result in a first order or second order polynomial. Below is a MWE:

rng default;

%% model definition
Model.mString = 'sin(2*pi*X(:,1))';
myModel       = uq_createModel(Model);

% list with number of samples to test
Samples_list = 2:4;
Nrun         = length(Samples_list);
error_LOO    = zeros(Nrun,1);

% loop over samples
for k=1:Nrun

  %% input
  mu    = 0.;
  sigma = 0.25;  

  Input.Marginals(1).Type = 'Uniform';
  Input.Marginals(1).Parameters = [mu-2*sigma, mu+2*sigma];
  Input.Marginals(1).Bounds = [mu-2*sigma, mu+2*sigma];
  myInput = uq_createInput(Input);

  %% LARS
  metamodelLARS.FullModel = myModel;
  metamodelLARS.Input     = myInput;
  metamodelLARS.Type      = 'Metamodel';
  metamodelLARS.MetaType  = 'PCE';
  metamodelLARS.Method    = 'LARS';
  metamodelLARS.Degree    = 1:8; % this automatically switches on degree adaptive PCE
  metamodelLARS.TruncOptions.qNorm = 0.5:0.1:1.5;

  metamodelLARS.ExpDesign.Sampling = 'LHS'; % or 'LHS' or 'Sobol' or 'Halton'
  metamodelLARS.ExpDesign.NSamples = Samples_list(k);

  myPCE_LARS  = uq_createModel(metamodelLARS);
    
  %% plot results
  Xsamples  = myPCE_LARS.ExpDesign.X;
  Ysamples  = myPCE_LARS.ExpDesign.Y;

  % evaluate the PCE model at many points
  X_PCE = linspace(Input.Marginals(1).Bounds(1),Input.Marginals(1).Bounds(2),100)';
  Y_PCE = uq_evalModel(myPCE_LARS,X_PCE);

  figure
  plot(Xsamples(:,1),Ysamples,'o');
  hold on
  plot(X_PCE(:,1),Y_PCE);

end

For 2 samples, I get as a solution a constant zero solution, whereas I would have expected a linear fit between the two data points. Another surprise is that I get myPCE_LARS.PCE.Basis.Degree to equal 1, while the basis coefficients are identically 0. I would think that’s a zero degree polynomial, given that the first (zeroth order) Legendre basis function is 1?
For 3 samples, a similar thing happens. I get a constant (non-zero) solution, whereas I would have expected a linear or quadratic fit between the three data points. Again, UQlab tells that the Degree is equal to 1, while only the first basis coefficient is nonzero.
For 4 samples things get much better, and I get as expected a third order polynomial with a nice fit.

So in summary I have two questions:

why does PCE on 2 or 3 samples not result in linear or quadratic approximations? is this because LARS is favouring low rank solutions, which are in this case a bit ‘too low’?
why does UQLab indicate that constant functions have degree 1?

I’m probably overlooking something since this is quite basic material. But any help would be appreciated.
Thanks!

Benjamin

nluethen · March 17, 2021, 12:41pm

Hi Benjamin,

Thanks for your question. I agree that on first glance, the results of your tests seem surprising, but there are easy explanations for this behavior.

When you specify a range of degrees, UQLab will perform degree adaptivity, meaning that a PCE is computed for each degree in the range, and the best one based on LOO is chosen. Now if you have only 2 points and do LOO, only 1 point is left in the training set, which means that a constant function will always be the best fit. That’s why you get a constant function for 2 points.
However, since you specified the degrees starting from 1, the lowest degree that UQLab can use for the candidate basis is 1. This is why you get myPCE_LARS.PCE.Basis.Degree = 1. Note that this variable tells you the degree of the best (candidate) basis, not the degree of the active basis (i.e., the set of basis functions with nonzero coefficient).
If you had used OLS instead of LARS, and not specified degree adaptivity but a constant degree of 1, you’d have gotten a straight line as expected.

For 3 samples, it is similar: LARS first adds the constant basis element to the model and computes the associated LOO value. Then a second basis element is added and the associated LOO value is computed. Any higher-order terms, computed using only 2 points, will in general always predict the third point badly. So the best model is the one containing only the constant term. The degree-1 basis is the first one that is tried, so this one is chosen.

In general LARS will include at most N-1 basis functions into the model for N samples. As you see in your MWE (myPCE_LARS.PCE.Coefficients), for 4 points LARS decides to skip the degree-2 basis function and only include degree 1,3, and 4. You get a degree-4 polynomial, but the PCE has only 3 active basis functions as expected.

I hope this answers your questions. You can also explore these things yourself by setting breakpoints in uq_lar or uq_PCE_lars, and checking the struct lar_results in uq_PCE_lars or the informative fields in myPCE_LARS.Internal.PCE.LARS. Have fun!

bsanderse · March 17, 2021, 1:17pm

Hi Nora,
Thanks for the quick reply! This makes a lot of sense. I was not well aware of the difference between the degree of the best (candidate) basis, and the degree of the active basis, that’s a good one to keep in mind!