# PCA based Sobol indices for vector-valued PCE surrogate model

Dear Community

Goal
My name is David and I try to write my own Matlab function to compute first order (S_i) and total (S_tot) Sobol indices for my vector-valued PCE surrogate model. The PCE surrogate model is combined with principal component analysis (PCA) to truncate the output dimension as described in [1,2].

Problem
To validate my function, I’ve checked if the variance decomposition according to S_tot(:,i)-S_i(:,i)+sum(S_i,2) results in the expected value of 1. Unfortunately, this wasn’t the case (for some output parameters, the value is <1). My first idea was that this might be attributed to the approximation by the PCA truncation. Therefore, I tested the entire procedure also with a PCE model, for which all principal components were kept. I got similar results.

Question

• What is the reason for the violation of the variance decomposition conservation, i.e. that S_tot(:,i)-S_i(:,i)+sum(S_i,2) ~=1?

• Could it be that there is a bug in my function or did I misinterpret the conservation formula?

• Or might this unexpected behaviour be related to the fact that for the PCA, I’ve not only performed centering with the mean (Y_mean) of the output variables but also scaling with the standard deviation (Y_std), i.e.:

Y_PCA = Y_mean + Z * Phi * Y_std

with Z being the transformed output variables in PCA space and Phi being the PCA transformation matrix (eigenvectors)? The cited studies [1,2] have used PCA without scaling. Therefore, maybe the adopted equations (eq. 34 in [1] and eq. 18,20,21 in [2]) to compute the first order and total Sobol indices have to be adapted for that case? According to my derivations, the standard deviation Y_std should cancel out, though…

Attachment
Contains the uq_PCA_Sobol function to compute the first order and total Sobol indices for the given input struct myMdl (which contains the PCE and PCE model):

Attachment.zip (2.3 MB)

``````function [S_tot,S_i]=uq_PCA_Sobol(myMdl)
%uq_PCA_Sobol Evaluates total and first order Sobol indices for          %
%             multivariate PCE+PCA model in original data space          %
%                                                                        %
% Input:                                                                 %
%   myMdl:                 Struct containing custom PCA and UQLab PCE    %
%                          model                                         %
%    myMdl.model:          UQLab PCE model                               %
%    myMdl.PCA:            Custom PCA model                              %
%     myMdl.PCA.T:         PCA transformation matrix | K x K*            %
%     myMdl.PCA.Z:         PCA transformated X matrix | N x K*           %
%     myMdl.PCA.mean:      mean X vector | 1 x K                         %
%     myMdl.PCA.std:       sample standard deviation X vector | 1 x K    %
%     myMdl.PCA.threshold: PCA threshold | 1 x 1                         %
%     myMdl.PCA.X:         data matrix to create PCA model | N x K       %
%    myMdl.rMSE:           relative mean squared error (validation)      %
%                          | 1 x 1                                       %
%    myMdl.Test:           test values (validation)                      %
%     myMdl.Test.rRMSE:    relative root mean squared error | 1 x 1      %
%     myMdl.Test.rMAE:     relative mean absolute error | 1 x 1          %
%     myMdl.Test.rMSE:     relative mean squared error | 1 x 1           %
%    myMdl.Type:           Emulator type 'PCE' | 1 x 1                   %
%                                                                        %
% Output:                                                                %
%   S_tot:                 total Sobol indcies                           %
%                          | K x M                                       %
%   S_i:                   first order Sobol indcies                     %
%                          | K x M                                       %
%                                                                        %
%                                                                        %
% Notes:                                                                 %
%  -  N:  # data points in input data matrix X used to create the        %
%           PCA model [1,inf)                                            %
%     K:  # variables in input data matrix X [1,inf)                     %
%     K*: # variables in PCA reduced data space ( K* <= K )              %
%     M:  # number of model parameters                                   %
%                                                                        %
% References:                                                            %
%                                                                        %
% [1]	P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,   %
%       “Bayesian calibration and sensitivity analysis of heat transfer  %
%       models for fire insulation panels,” Eng. Struct., vol. 205,      %
%       p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.      %
% [2]	J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal           %
%       component analysis and sparse polynomial chaos expansions for    %
%       global sensitivity analysis and model calibration: Application   %
%       to urban drainage simulation,” Reliab. Eng. Syst. Saf.,          %
%       vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737. %
%                                                                        %
%                                                                        %
% Written by:                                                            %
%  Paul Scherrer Institut                                                %
%  David Breitenmoser                                                    %
%  OFLD/U06                                                              %
%  Forschungsstrasse 111                                                 %
%  5232 Villigen PSI                                                     %
%  Switzerland                                                           %
%                                                                        %
%  Phone: +41 56 310 44 61                                               %
%  E-Mail: david.breitenmoser@psi.ch                                     %
%                                                                        %                                                                        %
%                                                                        %
% Copyright (c) 2022, David Breitenmoser                                 %
%                                                                        %
% Redistribution and use in source and binary forms, with or without     %
% modification, are permitted provided that the following conditions are %
% met:                                                                   %
%                                                                        %
%     * Redistributions of source code must retain the above copyright   %
%       notice, this list of conditions and the following disclaimer.    %
%     * Redistributions in binary form must reproduce the above copyright%
%       notice, this list of conditions and the following disclaimer in  %
%       the documentation and/or other materials provided with the       %
%       distribution                                                     %
%                                                                        %
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS%
% IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED  %
% TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A        %
% PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT     %
% OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,  %
% SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT       %
% LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,  %
% DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY  %
% THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT    %
% (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  %
% OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.   %
%                                                                        %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

%% 1) Process input parameters

% i) Check number of inputs
if nargin > 1
disp('ERROR: Too many inputs')
return
elseif nargin < 1
disp('ERROR: Too few inputs')
return
end

% ii) Number of output dimensions (data space)
K=size(myMdl.PCA.T,1);

% iii) Truncated number of output dimensions (PCA space)
K_star=size(myMdl.PCA.T,2);

% iv) Number of input dimensions (#variable model parameters)
M=length(myMdl.model.Options.Input.nonConst);

% v) Define PCA transformation matrix | K x K*
T=myMdl.PCA.T;

%% 2) Compute total Sobol indices (cf. eq. 34 in [1])

% i) Define truncated union set for multi indices alpha | N_alpha_star x M
Alpha=cell(K_star,1);
for t=1:K_star
Alpha{t}=full(myMdl.model.PCE(t).Basis.Indices);
end

Alpha_star = unique(cat(1,Alpha{:}),'rows');
N_alpha_star=size(Alpha_star,1);

% ii) Define PCE coefficient matrix | N_alpha_star x K_star
A=zeros(N_alpha_star,K_star);
for t=1:K_star
[~,idx_a2A]=ismember(Alpha{t},Alpha_star,'rows');
A(idx_a2A,t)=myMdl.model.PCE(t).Coefficients;
clear idx_a2A
end

% iii) Preallocate array
S_tot=zeros(K,M);
Var_Y=zeros(K,1);

% iv) Iterate over data space dimension t
for t=1:K

% v) Iterate over input parameter dimension m
for m=1:M

% vi) Compute nominator in equation 34 in [1]
idx_alpha_star0=find(Alpha_star(:,m)==0);
N_alpha_star0=length(idx_alpha_star0);
var_num=0;
for j=1:N_alpha_star0

var_num_tmp=0;
for p=1:K_star
var_num_tmp=var_num_tmp+A(idx_alpha_star0(j),p)*T(t,p);

end
var_num=var_num+var_num_tmp^2;

end
clear var_num_tmp idx_alpha_star0 N_alpha_star0

% vii) Compute denominator in equation 34 in [1]
var_den=0;
for j=1:N_alpha_star

var_den_tmp=0;
for p=1:K_star
var_den_tmp=var_den_tmp+A(j,p)*T(t,p);

end
var_den=var_den+var_den_tmp^2;

end
clear var_den_tmp

% viii) Compute total sobol index according to equation 34 in [1]
S_tot(t,m) = 1 - var_num / var_den;

% ix) Store variance for output dimension t
Var_Y(t,1) = var_den;

clear var_num var_den
end
end

clear t m j

%% 3) Compute first order Sobol indices (cf. eq. 18 in [2])

% i) Preallocate array
S_i=zeros(K,M);

% ii) Iterate over data space dimension t
for t=1:K

% iii) Iterate over input parameter dimension m
for m=1:M

% iv) Preallocate array
S_i_tmp=0;

% v) Iterate over PCA output dimension p
for p=1:K_star

% vi) Define variance of PCA output dimension p
Var_Zp=myMdl.model.PCE(p).Moments.Var;
%             Var_Zi=sum(A_i(2:end).^2);

% vii) Define PCE coefficients for PCA output dimension p
A_p=myMdl.model.PCE(p).Coefficients;

% viii) Define indices of set
%      Alpha_{i} := {alpha element of natural numbers^M :
%                    alpha(i)>0, alpha(j~=i)=0}
idx_alpha=find(Alpha{p}(:,m)>0 & all(Alpha{p}(:,setdiff(1:M,m))==0,2));

% ix) Compute first order Sobol index for PCA output dimension
%     p
S_Zpi=sum(A_p(idx_alpha).^2)/Var_Zp;

% x) Update first order Sobol index for data output dimension
%    t
S_i_tmp=S_i_tmp+S_Zpi*Var_Zp/Var_Y(t,1)*T(t,p)^2;

% xi) Compute covariance term in eq. 18 in [2]

for q=1:K_star

if p < q

Alpha_star_tmp = unique(cat(1,Alpha{[p,q]}),'rows');
N_alpha_star_tmp=size(Alpha_star_tmp,1);

A_tmp=zeros(N_alpha_star_tmp,2);
l=1;

for k=[p,q]
[~,idx_a2A]=ismember(Alpha{k},Alpha_star_tmp,'rows');
A_tmp(idx_a2A,l)=myMdl.model.PCE(k).Coefficients;
l=l+1;
clear idx_a2A
end

idx_alpha_star_tmp=find(Alpha_star_tmp(:,m)>0 & all(Alpha_star_tmp(:,setdiff(1:M,m))==0,2));
cov_pq=sum(A_tmp(idx_alpha_star_tmp,1).*A_tmp(idx_alpha_star_tmp,2));

% xii) Update first order Sobol index for data output dimension
%    t
S_i_tmp = S_i_tmp + 2*cov_pq/Var_Y(t,1)*T(t,p)*T(t,q);
end

end

end

% xiii) Assign first order sobol index for data output dimension t
%    according to equation 18 in [2]
S_i(t,m) = S_i_tmp;

end
end

%% 6) Debugg
color=linspecer_ed(3);
figure;
hold on
for j=1:M
plot(S_tot(:,j),'Color',color(j,:))
plot(S_i(:,j),'--','Color',color(j,:))
end

figure;
hold on
for j=1:M
plot(S_tot(:,j)-S_i(:,j)+sum(S_i,2),'Color',color(j,:))
end

end

``````

References

[1] P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,
“Bayesian calibration and sensitivity analysis of heat transfer
models for fire insulation panels,” Eng. Struct., vol. 205,
p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.

[2] J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal
component analysis and sparse polynomial chaos expansions for
global sensitivity analysis and model calibration: Application
to urban drainage simulation,” Reliab. Eng. Syst. Saf.,
vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737.