Dear Community
Goal
My name is David and I try to write my own Matlab function to compute first order (S_i) and total (S_tot) Sobol indices for my vector-valued PCE surrogate model. The PCE surrogate model is combined with principal component analysis (PCA) to truncate the output dimension as described in [1,2].
Problem
To validate my function, I’ve checked if the variance decomposition according to S_tot(:,i)-S_i(:,i)+sum(S_i,2) results in the expected value of 1. Unfortunately, this wasn’t the case (for some output parameters, the value is <1). My first idea was that this might be attributed to the approximation by the PCA truncation. Therefore, I tested the entire procedure also with a PCE model, for which all principal components were kept. I got similar results.
Question
-
What is the reason for the violation of the variance decomposition conservation, i.e. that S_tot(:,i)-S_i(:,i)+sum(S_i,2) ~=1?
-
Could it be that there is a bug in my function or did I misinterpret the conservation formula?
-
Or might this unexpected behaviour be related to the fact that for the PCA, I’ve not only performed centering with the mean (Y_mean) of the output variables but also scaling with the standard deviation (Y_std), i.e.:
Y_PCA = Y_mean + Z * Phi * Y_std
with Z being the transformed output variables in PCA space and Phi being the PCA transformation matrix (eigenvectors)? The cited studies [1,2] have used PCA without scaling. Therefore, maybe the adopted equations (eq. 34 in [1] and eq. 18,20,21 in [2]) to compute the first order and total Sobol indices have to be adapted for that case? According to my derivations, the standard deviation Y_std should cancel out, though…
Attachment
Contains the uq_PCA_Sobol function to compute the first order and total Sobol indices for the given input struct myMdl (which contains the PCE and PCE model):
Attachment.zip (2.3 MB)
function [S_tot,S_i]=uq_PCA_Sobol(myMdl)
%uq_PCA_Sobol Evaluates total and first order Sobol indices for %
% multivariate PCE+PCA model in original data space %
% %
% Input: %
% myMdl: Struct containing custom PCA and UQLab PCE %
% model %
% myMdl.model: UQLab PCE model %
% myMdl.PCA: Custom PCA model %
% myMdl.PCA.T: PCA transformation matrix | K x K* %
% myMdl.PCA.Z: PCA transformated X matrix | N x K* %
% myMdl.PCA.mean: mean X vector | 1 x K %
% myMdl.PCA.std: sample standard deviation X vector | 1 x K %
% myMdl.PCA.threshold: PCA threshold | 1 x 1 %
% myMdl.PCA.X: data matrix to create PCA model | N x K %
% myMdl.rMSE: relative mean squared error (validation) %
% | 1 x 1 %
% myMdl.Test: test values (validation) %
% myMdl.Test.rRMSE: relative root mean squared error | 1 x 1 %
% myMdl.Test.rMAE: relative mean absolute error | 1 x 1 %
% myMdl.Test.rMSE: relative mean squared error | 1 x 1 %
% myMdl.Type: Emulator type 'PCE' | 1 x 1 %
% %
% Output: %
% S_tot: total Sobol indcies %
% | K x M %
% S_i: first order Sobol indcies %
% | K x M %
% %
% %
% Notes: %
% - N: # data points in input data matrix X used to create the %
% PCA model [1,inf) %
% K: # variables in input data matrix X [1,inf) %
% K*: # variables in PCA reduced data space ( K* <= K ) %
% M: # number of model parameters %
% %
% References: %
% %
% [1] P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret, %
% “Bayesian calibration and sensitivity analysis of heat transfer %
% models for fire insulation panels,” Eng. Struct., vol. 205, %
% p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063. %
% [2] J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal %
% component analysis and sparse polynomial chaos expansions for %
% global sensitivity analysis and model calibration: Application %
% to urban drainage simulation,” Reliab. Eng. Syst. Saf., %
% vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737. %
% %
% %
% Written by: %
% Paul Scherrer Institut %
% David Breitenmoser %
% OFLD/U06 %
% Forschungsstrasse 111 %
% 5232 Villigen PSI %
% Switzerland %
% %
% Phone: +41 56 310 44 61 %
% E-Mail: david.breitenmoser@psi.ch %
% % %
% %
% Copyright (c) 2022, David Breitenmoser %
% All rights reserved. %
% %
% Redistribution and use in source and binary forms, with or without %
% modification, are permitted provided that the following conditions are %
% met: %
% %
% * Redistributions of source code must retain the above copyright %
% notice, this list of conditions and the following disclaimer. %
% * Redistributions in binary form must reproduce the above copyright%
% notice, this list of conditions and the following disclaimer in %
% the documentation and/or other materials provided with the %
% distribution %
% %
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS%
% IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED %
% TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A %
% PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT %
% OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, %
% SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT %
% LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, %
% DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY %
% THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT %
% (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE %
% OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. %
% %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%% 1) Process input parameters
% i) Check number of inputs
if nargin > 1
disp('ERROR: Too many inputs')
return
elseif nargin < 1
disp('ERROR: Too few inputs')
return
end
% ii) Number of output dimensions (data space)
K=size(myMdl.PCA.T,1);
% iii) Truncated number of output dimensions (PCA space)
K_star=size(myMdl.PCA.T,2);
% iv) Number of input dimensions (#variable model parameters)
M=length(myMdl.model.Options.Input.nonConst);
% v) Define PCA transformation matrix | K x K*
T=myMdl.PCA.T;
%% 2) Compute total Sobol indices (cf. eq. 34 in [1])
% i) Define truncated union set for multi indices alpha | N_alpha_star x M
Alpha=cell(K_star,1);
for t=1:K_star
Alpha{t}=full(myMdl.model.PCE(t).Basis.Indices);
end
Alpha_star = unique(cat(1,Alpha{:}),'rows');
N_alpha_star=size(Alpha_star,1);
% ii) Define PCE coefficient matrix | N_alpha_star x K_star
A=zeros(N_alpha_star,K_star);
for t=1:K_star
[~,idx_a2A]=ismember(Alpha{t},Alpha_star,'rows');
A(idx_a2A,t)=myMdl.model.PCE(t).Coefficients;
clear idx_a2A
end
% iii) Preallocate array
S_tot=zeros(K,M);
Var_Y=zeros(K,1);
% iv) Iterate over data space dimension t
for t=1:K
% v) Iterate over input parameter dimension m
for m=1:M
% vi) Compute nominator in equation 34 in [1]
idx_alpha_star0=find(Alpha_star(:,m)==0);
N_alpha_star0=length(idx_alpha_star0);
var_num=0;
for j=1:N_alpha_star0
var_num_tmp=0;
for p=1:K_star
var_num_tmp=var_num_tmp+A(idx_alpha_star0(j),p)*T(t,p);
end
var_num=var_num+var_num_tmp^2;
end
clear var_num_tmp idx_alpha_star0 N_alpha_star0
% vii) Compute denominator in equation 34 in [1]
var_den=0;
for j=1:N_alpha_star
var_den_tmp=0;
for p=1:K_star
var_den_tmp=var_den_tmp+A(j,p)*T(t,p);
end
var_den=var_den+var_den_tmp^2;
end
clear var_den_tmp
% viii) Compute total sobol index according to equation 34 in [1]
S_tot(t,m) = 1 - var_num / var_den;
% ix) Store variance for output dimension t
Var_Y(t,1) = var_den;
clear var_num var_den
end
end
clear t m j
%% 3) Compute first order Sobol indices (cf. eq. 18 in [2])
% i) Preallocate array
S_i=zeros(K,M);
% ii) Iterate over data space dimension t
for t=1:K
% iii) Iterate over input parameter dimension m
for m=1:M
% iv) Preallocate array
S_i_tmp=0;
% v) Iterate over PCA output dimension p
for p=1:K_star
% vi) Define variance of PCA output dimension p
Var_Zp=myMdl.model.PCE(p).Moments.Var;
% Var_Zi=sum(A_i(2:end).^2);
% vii) Define PCE coefficients for PCA output dimension p
A_p=myMdl.model.PCE(p).Coefficients;
% viii) Define indices of set
% Alpha_{i} := {alpha element of natural numbers^M :
% alpha(i)>0, alpha(j~=i)=0}
idx_alpha=find(Alpha{p}(:,m)>0 & all(Alpha{p}(:,setdiff(1:M,m))==0,2));
% ix) Compute first order Sobol index for PCA output dimension
% p
S_Zpi=sum(A_p(idx_alpha).^2)/Var_Zp;
% x) Update first order Sobol index for data output dimension
% t
S_i_tmp=S_i_tmp+S_Zpi*Var_Zp/Var_Y(t,1)*T(t,p)^2;
% xi) Compute covariance term in eq. 18 in [2]
for q=1:K_star
if p < q
Alpha_star_tmp = unique(cat(1,Alpha{[p,q]}),'rows');
N_alpha_star_tmp=size(Alpha_star_tmp,1);
A_tmp=zeros(N_alpha_star_tmp,2);
l=1;
for k=[p,q]
[~,idx_a2A]=ismember(Alpha{k},Alpha_star_tmp,'rows');
A_tmp(idx_a2A,l)=myMdl.model.PCE(k).Coefficients;
l=l+1;
clear idx_a2A
end
idx_alpha_star_tmp=find(Alpha_star_tmp(:,m)>0 & all(Alpha_star_tmp(:,setdiff(1:M,m))==0,2));
cov_pq=sum(A_tmp(idx_alpha_star_tmp,1).*A_tmp(idx_alpha_star_tmp,2));
% xii) Update first order Sobol index for data output dimension
% t
S_i_tmp = S_i_tmp + 2*cov_pq/Var_Y(t,1)*T(t,p)*T(t,q);
end
end
end
% xiii) Assign first order sobol index for data output dimension t
% according to equation 18 in [2]
S_i(t,m) = S_i_tmp;
end
end
%% 6) Debugg
color=linspecer_ed(3);
figure;
hold on
for j=1:M
plot(S_tot(:,j),'Color',color(j,:))
plot(S_i(:,j),'--','Color',color(j,:))
end
figure;
hold on
for j=1:M
plot(S_tot(:,j)-S_i(:,j)+sum(S_i,2),'Color',color(j,:))
end
end
References
[1] P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,
“Bayesian calibration and sensitivity analysis of heat transfer
models for fire insulation panels,” Eng. Struct., vol. 205,
p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.
[2] J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal
component analysis and sparse polynomial chaos expansions for
global sensitivity analysis and model calibration: Application
to urban drainage simulation,” Reliab. Eng. Syst. Saf.,
vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737.