PCA based Sobol indices for vector-valued PCE surrogate model

Dear Community

Goal
My name is David and I try to write my own Matlab function to compute first order (S_i) and total (S_tot) Sobol indices for my vector-valued PCE surrogate model. The PCE surrogate model is combined with principal component analysis (PCA) to truncate the output dimension as described in [1,2].

Problem
To validate my function, I’ve checked if the variance decomposition according to S_tot(:,i)-S_i(:,i)+sum(S_i,2) results in the expected value of 1. Unfortunately, this wasn’t the case (for some output parameters, the value is <1). My first idea was that this might be attributed to the approximation by the PCA truncation. Therefore, I tested the entire procedure also with a PCE model, for which all principal components were kept. I got similar results.

Question

  • What is the reason for the violation of the variance decomposition conservation, i.e. that S_tot(:,i)-S_i(:,i)+sum(S_i,2) ~=1?

  • Could it be that there is a bug in my function or did I misinterpret the conservation formula?

  • Or might this unexpected behaviour be related to the fact that for the PCA, I’ve not only performed centering with the mean (Y_mean) of the output variables but also scaling with the standard deviation (Y_std), i.e.:

    Y_PCA = Y_mean + Z * Phi * Y_std

    with Z being the transformed output variables in PCA space and Phi being the PCA transformation matrix (eigenvectors)? The cited studies [1,2] have used PCA without scaling. Therefore, maybe the adopted equations (eq. 34 in [1] and eq. 18,20,21 in [2]) to compute the first order and total Sobol indices have to be adapted for that case? According to my derivations, the standard deviation Y_std should cancel out, though…

Attachment
Contains the uq_PCA_Sobol function to compute the first order and total Sobol indices for the given input struct myMdl (which contains the PCE and PCE model):

Attachment.zip (2.3 MB)

function [S_tot,S_i]=uq_PCA_Sobol(myMdl)
%uq_PCA_Sobol Evaluates total and first order Sobol indices for          %
%             multivariate PCE+PCA model in original data space          %
%                                                                        %
% Input:                                                                 %
%   myMdl:                 Struct containing custom PCA and UQLab PCE    %
%                          model                                         %
%    myMdl.model:          UQLab PCE model                               %
%    myMdl.PCA:            Custom PCA model                              %
%     myMdl.PCA.T:         PCA transformation matrix | K x K*            %
%     myMdl.PCA.Z:         PCA transformated X matrix | N x K*           %
%     myMdl.PCA.mean:      mean X vector | 1 x K                         %
%     myMdl.PCA.std:       sample standard deviation X vector | 1 x K    %
%     myMdl.PCA.threshold: PCA threshold | 1 x 1                         %
%     myMdl.PCA.X:         data matrix to create PCA model | N x K       %
%    myMdl.rMSE:           relative mean squared error (validation)      %
%                          | 1 x 1                                       %
%    myMdl.Test:           test values (validation)                      %
%     myMdl.Test.rRMSE:    relative root mean squared error | 1 x 1      %
%     myMdl.Test.rMAE:     relative mean absolute error | 1 x 1          %
%     myMdl.Test.rMSE:     relative mean squared error | 1 x 1           %
%    myMdl.Type:           Emulator type 'PCE' | 1 x 1                   %
%                                                                        %
% Output:                                                                %
%   S_tot:                 total Sobol indcies                           %
%                          | K x M                                       %
%   S_i:                   first order Sobol indcies                     %
%                          | K x M                                       %
%                                                                        %
%                                                                        %
% Notes:                                                                 %
%  -  N:  # data points in input data matrix X used to create the        %
%           PCA model [1,inf)                                            %
%     K:  # variables in input data matrix X [1,inf)                     %
%     K*: # variables in PCA reduced data space ( K* <= K )              %
%     M:  # number of model parameters                                   %
%                                                                        %
% References:                                                            %
%                                                                        %
% [1]	P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,   %
%       “Bayesian calibration and sensitivity analysis of heat transfer  %
%       models for fire insulation panels,” Eng. Struct., vol. 205,      %
%       p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.      %
% [2]	J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal           %
%       component analysis and sparse polynomial chaos expansions for    %
%       global sensitivity analysis and model calibration: Application   %
%       to urban drainage simulation,” Reliab. Eng. Syst. Saf.,          %
%       vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737. %
%                                                                        %
%                                                                        %
% Written by:                                                            %
%  Paul Scherrer Institut                                                %
%  David Breitenmoser                                                    %
%  OFLD/U06                                                              %
%  Forschungsstrasse 111                                                 %
%  5232 Villigen PSI                                                     %
%  Switzerland                                                           %
%                                                                        %
%  Phone: +41 56 310 44 61                                               %
%  E-Mail: david.breitenmoser@psi.ch                                     %
%                                                                        %                                                                        %
%                                                                        %
% Copyright (c) 2022, David Breitenmoser                                 %
% All rights reserved.                                                   %
%                                                                        %
% Redistribution and use in source and binary forms, with or without     %
% modification, are permitted provided that the following conditions are %
% met:                                                                   %
%                                                                        %
%     * Redistributions of source code must retain the above copyright   %
%       notice, this list of conditions and the following disclaimer.    %
%     * Redistributions in binary form must reproduce the above copyright%
%       notice, this list of conditions and the following disclaimer in  %
%       the documentation and/or other materials provided with the       %
%       distribution                                                     %
%                                                                        %
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS%
% IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED  %
% TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A        %
% PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT     %
% OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,  %
% SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT       %
% LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,  %
% DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY  %
% THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT    %
% (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  %
% OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.   %
%                                                                        %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% 1) Process input parameters

% i) Check number of inputs
if nargin > 1
    disp('ERROR: Too many inputs')
    return
elseif nargin < 1
    disp('ERROR: Too few inputs')
    return
end

% ii) Number of output dimensions (data space)
K=size(myMdl.PCA.T,1);

% iii) Truncated number of output dimensions (PCA space)
K_star=size(myMdl.PCA.T,2);

% iv) Number of input dimensions (#variable model parameters)
M=length(myMdl.model.Options.Input.nonConst);

% v) Define PCA transformation matrix | K x K*
T=myMdl.PCA.T;


%% 2) Compute total Sobol indices (cf. eq. 34 in [1])

% i) Define truncated union set for multi indices alpha | N_alpha_star x M
Alpha=cell(K_star,1);
for t=1:K_star
    Alpha{t}=full(myMdl.model.PCE(t).Basis.Indices);
end

Alpha_star = unique(cat(1,Alpha{:}),'rows'); 
N_alpha_star=size(Alpha_star,1);

% ii) Define PCE coefficient matrix | N_alpha_star x K_star
A=zeros(N_alpha_star,K_star);
for t=1:K_star
    [~,idx_a2A]=ismember(Alpha{t},Alpha_star,'rows');
    A(idx_a2A,t)=myMdl.model.PCE(t).Coefficients;
    clear idx_a2A
end

% iii) Preallocate array
S_tot=zeros(K,M);
Var_Y=zeros(K,1);

% iv) Iterate over data space dimension t
for t=1:K
    
    % v) Iterate over input parameter dimension m
    for m=1:M
        
        % vi) Compute nominator in equation 34 in [1]
        idx_alpha_star0=find(Alpha_star(:,m)==0);
        N_alpha_star0=length(idx_alpha_star0);
        var_num=0;
        for j=1:N_alpha_star0
            
            var_num_tmp=0;
            for p=1:K_star
                var_num_tmp=var_num_tmp+A(idx_alpha_star0(j),p)*T(t,p);
                
            end
            var_num=var_num+var_num_tmp^2;
            
        end
        clear var_num_tmp idx_alpha_star0 N_alpha_star0
        
        % vii) Compute denominator in equation 34 in [1]
        var_den=0;
        for j=1:N_alpha_star
            
            var_den_tmp=0;
            for p=1:K_star
                var_den_tmp=var_den_tmp+A(j,p)*T(t,p);
                
            end
            var_den=var_den+var_den_tmp^2;
            
        end
        clear var_den_tmp
        
        % viii) Compute total sobol index according to equation 34 in [1]
        S_tot(t,m) = 1 - var_num / var_den;
        
        % ix) Store variance for output dimension t
        Var_Y(t,1) = var_den;

        clear var_num var_den 
    end
end

clear t m j


%% 3) Compute first order Sobol indices (cf. eq. 18 in [2])

% i) Preallocate array
S_i=zeros(K,M);

% ii) Iterate over data space dimension t
for t=1:K
    
    % iii) Iterate over input parameter dimension m
    for m=1:M

        % iv) Preallocate array
        S_i_tmp=0;
        
        % v) Iterate over PCA output dimension p
        for p=1:K_star
            
            % vi) Define variance of PCA output dimension p
            Var_Zp=myMdl.model.PCE(p).Moments.Var;  
%             Var_Zi=sum(A_i(2:end).^2);

            % vii) Define PCE coefficients for PCA output dimension p
            A_p=myMdl.model.PCE(p).Coefficients;

            % viii) Define indices of set 
            %      Alpha_{i} := {alpha element of natural numbers^M : 
            %                    alpha(i)>0, alpha(j~=i)=0}
            idx_alpha=find(Alpha{p}(:,m)>0 & all(Alpha{p}(:,setdiff(1:M,m))==0,2));

            % ix) Compute first order Sobol index for PCA output dimension
            %     p
            S_Zpi=sum(A_p(idx_alpha).^2)/Var_Zp;

            % x) Update first order Sobol index for data output dimension
            %    t         
            S_i_tmp=S_i_tmp+S_Zpi*Var_Zp/Var_Y(t,1)*T(t,p)^2;
            
            % xi) Compute covariance term in eq. 18 in [2] 

            for q=1:K_star
                
                if p < q
                    
                    Alpha_star_tmp = unique(cat(1,Alpha{[p,q]}),'rows');
                    N_alpha_star_tmp=size(Alpha_star_tmp,1);
                    
                    A_tmp=zeros(N_alpha_star_tmp,2);
                    l=1;
                    
                    for k=[p,q]
                        [~,idx_a2A]=ismember(Alpha{k},Alpha_star_tmp,'rows');
                        A_tmp(idx_a2A,l)=myMdl.model.PCE(k).Coefficients;
                        l=l+1;
                        clear idx_a2A
                    end
                    
                    idx_alpha_star_tmp=find(Alpha_star_tmp(:,m)>0 & all(Alpha_star_tmp(:,setdiff(1:M,m))==0,2));
                    cov_pq=sum(A_tmp(idx_alpha_star_tmp,1).*A_tmp(idx_alpha_star_tmp,2));

                    % xii) Update first order Sobol index for data output dimension
                    %    t 
                    S_i_tmp = S_i_tmp + 2*cov_pq/Var_Y(t,1)*T(t,p)*T(t,q);
                end
                
            end
            
        end
        
        % xiii) Assign first order sobol index for data output dimension t 
        %    according to equation 18 in [2]
        S_i(t,m) = S_i_tmp;
        
    end
end

%% 6) Debugg
color=linspecer_ed(3);
figure;
hold on
for j=1:M
    plot(S_tot(:,j),'Color',color(j,:))
    plot(S_i(:,j),'--','Color',color(j,:))
end

figure;
hold on
for j=1:M
    plot(S_tot(:,j)-S_i(:,j)+sum(S_i,2),'Color',color(j,:))
end

end


References

[1] P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,
“Bayesian calibration and sensitivity analysis of heat transfer
models for fire insulation panels,” Eng. Struct., vol. 205,
p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.

[2] J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal
component analysis and sparse polynomial chaos expansions for
global sensitivity analysis and model calibration: Application
to urban drainage simulation,” Reliab. Eng. Syst. Saf.,
vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737.