PCA based Sobol indices for vector-valued PCE surrogate model

Dear Community

Goal
My name is David and I try to write my own Matlab function to compute first order (S_i) and total (S_tot) Sobol indices for my vector-valued PCE surrogate model. The PCE surrogate model is combined with principal component analysis (PCA) to truncate the output dimension as described in [1,2].

Problem
To validate my function, I’ve checked if the variance decomposition according to S_tot(:,i)-S_i(:,i)+sum(S_i,2) results in the expected value of 1. Unfortunately, this wasn’t the case (for some output parameters, the value is <1). My first idea was that this might be attributed to the approximation by the PCA truncation. Therefore, I tested the entire procedure also with a PCE model, for which all principal components were kept. I got similar results.

Question

  • What is the reason for the violation of the variance decomposition conservation, i.e. that S_tot(:,i)-S_i(:,i)+sum(S_i,2) ~=1?

  • Could it be that there is a bug in my function or did I misinterpret the conservation formula?

  • Or might this unexpected behaviour be related to the fact that for the PCA, I’ve not only performed centering with the mean (Y_mean) of the output variables but also scaling with the standard deviation (Y_std), i.e.:

    Y_PCA = Y_mean + Z * Phi * Y_std

    with Z being the transformed output variables in PCA space and Phi being the PCA transformation matrix (eigenvectors)? The cited studies [1,2] have used PCA without scaling. Therefore, maybe the adopted equations (eq. 34 in [1] and eq. 18,20,21 in [2]) to compute the first order and total Sobol indices have to be adapted for that case? According to my derivations, the standard deviation Y_std should cancel out, though…

Attachment
Contains the uq_PCA_Sobol function to compute the first order and total Sobol indices for the given input struct myMdl (which contains the PCE and PCE model):

Attachment.zip (2.3 MB)

function [S_tot,S_i]=uq_PCA_Sobol(myMdl)
%uq_PCA_Sobol Evaluates total and first order Sobol indices for          %
%             multivariate PCE+PCA model in original data space          %
%                                                                        %
% Input:                                                                 %
%   myMdl:                 Struct containing custom PCA and UQLab PCE    %
%                          model                                         %
%    myMdl.model:          UQLab PCE model                               %
%    myMdl.PCA:            Custom PCA model                              %
%     myMdl.PCA.T:         PCA transformation matrix | K x K*            %
%     myMdl.PCA.Z:         PCA transformated X matrix | N x K*           %
%     myMdl.PCA.mean:      mean X vector | 1 x K                         %
%     myMdl.PCA.std:       sample standard deviation X vector | 1 x K    %
%     myMdl.PCA.threshold: PCA threshold | 1 x 1                         %
%     myMdl.PCA.X:         data matrix to create PCA model | N x K       %
%    myMdl.rMSE:           relative mean squared error (validation)      %
%                          | 1 x 1                                       %
%    myMdl.Test:           test values (validation)                      %
%     myMdl.Test.rRMSE:    relative root mean squared error | 1 x 1      %
%     myMdl.Test.rMAE:     relative mean absolute error | 1 x 1          %
%     myMdl.Test.rMSE:     relative mean squared error | 1 x 1           %
%    myMdl.Type:           Emulator type 'PCE' | 1 x 1                   %
%                                                                        %
% Output:                                                                %
%   S_tot:                 total Sobol indcies                           %
%                          | K x M                                       %
%   S_i:                   first order Sobol indcies                     %
%                          | K x M                                       %
%                                                                        %
%                                                                        %
% Notes:                                                                 %
%  -  N:  # data points in input data matrix X used to create the        %
%           PCA model [1,inf)                                            %
%     K:  # variables in input data matrix X [1,inf)                     %
%     K*: # variables in PCA reduced data space ( K* <= K )              %
%     M:  # number of model parameters                                   %
%                                                                        %
% References:                                                            %
%                                                                        %
% [1]	P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,   %
%       “Bayesian calibration and sensitivity analysis of heat transfer  %
%       models for fire insulation panels,” Eng. Struct., vol. 205,      %
%       p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.      %
% [2]	J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal           %
%       component analysis and sparse polynomial chaos expansions for    %
%       global sensitivity analysis and model calibration: Application   %
%       to urban drainage simulation,” Reliab. Eng. Syst. Saf.,          %
%       vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737. %
%                                                                        %
%                                                                        %
% Written by:                                                            %
%  Paul Scherrer Institut                                                %
%  David Breitenmoser                                                    %
%  OFLD/U06                                                              %
%  Forschungsstrasse 111                                                 %
%  5232 Villigen PSI                                                     %
%  Switzerland                                                           %
%                                                                        %
%  Phone: +41 56 310 44 61                                               %
%  E-Mail: david.breitenmoser@psi.ch                                     %
%                                                                        %                                                                        %
%                                                                        %
% Copyright (c) 2022, David Breitenmoser                                 %
% All rights reserved.                                                   %
%                                                                        %
% Redistribution and use in source and binary forms, with or without     %
% modification, are permitted provided that the following conditions are %
% met:                                                                   %
%                                                                        %
%     * Redistributions of source code must retain the above copyright   %
%       notice, this list of conditions and the following disclaimer.    %
%     * Redistributions in binary form must reproduce the above copyright%
%       notice, this list of conditions and the following disclaimer in  %
%       the documentation and/or other materials provided with the       %
%       distribution                                                     %
%                                                                        %
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS%
% IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED  %
% TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A        %
% PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT     %
% OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,  %
% SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT       %
% LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,  %
% DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY  %
% THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT    %
% (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE  %
% OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.   %
%                                                                        %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%


%% 1) Process input parameters

% i) Check number of inputs
if nargin > 1
    disp('ERROR: Too many inputs')
    return
elseif nargin < 1
    disp('ERROR: Too few inputs')
    return
end

% ii) Number of output dimensions (data space)
K=size(myMdl.PCA.T,1);

% iii) Truncated number of output dimensions (PCA space)
K_star=size(myMdl.PCA.T,2);

% iv) Number of input dimensions (#variable model parameters)
M=length(myMdl.model.Options.Input.nonConst);

% v) Define PCA transformation matrix | K x K*
T=myMdl.PCA.T;


%% 2) Compute total Sobol indices (cf. eq. 34 in [1])

% i) Define truncated union set for multi indices alpha | N_alpha_star x M
Alpha=cell(K_star,1);
for t=1:K_star
    Alpha{t}=full(myMdl.model.PCE(t).Basis.Indices);
end

Alpha_star = unique(cat(1,Alpha{:}),'rows'); 
N_alpha_star=size(Alpha_star,1);

% ii) Define PCE coefficient matrix | N_alpha_star x K_star
A=zeros(N_alpha_star,K_star);
for t=1:K_star
    [~,idx_a2A]=ismember(Alpha{t},Alpha_star,'rows');
    A(idx_a2A,t)=myMdl.model.PCE(t).Coefficients;
    clear idx_a2A
end

% iii) Preallocate array
S_tot=zeros(K,M);
Var_Y=zeros(K,1);

% iv) Iterate over data space dimension t
for t=1:K
    
    % v) Iterate over input parameter dimension m
    for m=1:M
        
        % vi) Compute nominator in equation 34 in [1]
        idx_alpha_star0=find(Alpha_star(:,m)==0);
        N_alpha_star0=length(idx_alpha_star0);
        var_num=0;
        for j=1:N_alpha_star0
            
            var_num_tmp=0;
            for p=1:K_star
                var_num_tmp=var_num_tmp+A(idx_alpha_star0(j),p)*T(t,p);
                
            end
            var_num=var_num+var_num_tmp^2;
            
        end
        clear var_num_tmp idx_alpha_star0 N_alpha_star0
        
        % vii) Compute denominator in equation 34 in [1]
        var_den=0;
        for j=1:N_alpha_star
            
            var_den_tmp=0;
            for p=1:K_star
                var_den_tmp=var_den_tmp+A(j,p)*T(t,p);
                
            end
            var_den=var_den+var_den_tmp^2;
            
        end
        clear var_den_tmp
        
        % viii) Compute total sobol index according to equation 34 in [1]
        S_tot(t,m) = 1 - var_num / var_den;
        
        % ix) Store variance for output dimension t
        Var_Y(t,1) = var_den;

        clear var_num var_den 
    end
end

clear t m j


%% 3) Compute first order Sobol indices (cf. eq. 18 in [2])

% i) Preallocate array
S_i=zeros(K,M);

% ii) Iterate over data space dimension t
for t=1:K
    
    % iii) Iterate over input parameter dimension m
    for m=1:M

        % iv) Preallocate array
        S_i_tmp=0;
        
        % v) Iterate over PCA output dimension p
        for p=1:K_star
            
            % vi) Define variance of PCA output dimension p
            Var_Zp=myMdl.model.PCE(p).Moments.Var;  
%             Var_Zi=sum(A_i(2:end).^2);

            % vii) Define PCE coefficients for PCA output dimension p
            A_p=myMdl.model.PCE(p).Coefficients;

            % viii) Define indices of set 
            %      Alpha_{i} := {alpha element of natural numbers^M : 
            %                    alpha(i)>0, alpha(j~=i)=0}
            idx_alpha=find(Alpha{p}(:,m)>0 & all(Alpha{p}(:,setdiff(1:M,m))==0,2));

            % ix) Compute first order Sobol index for PCA output dimension
            %     p
            S_Zpi=sum(A_p(idx_alpha).^2)/Var_Zp;

            % x) Update first order Sobol index for data output dimension
            %    t         
            S_i_tmp=S_i_tmp+S_Zpi*Var_Zp/Var_Y(t,1)*T(t,p)^2;
            
            % xi) Compute covariance term in eq. 18 in [2] 

            for q=1:K_star
                
                if p < q
                    
                    Alpha_star_tmp = unique(cat(1,Alpha{[p,q]}),'rows');
                    N_alpha_star_tmp=size(Alpha_star_tmp,1);
                    
                    A_tmp=zeros(N_alpha_star_tmp,2);
                    l=1;
                    
                    for k=[p,q]
                        [~,idx_a2A]=ismember(Alpha{k},Alpha_star_tmp,'rows');
                        A_tmp(idx_a2A,l)=myMdl.model.PCE(k).Coefficients;
                        l=l+1;
                        clear idx_a2A
                    end
                    
                    idx_alpha_star_tmp=find(Alpha_star_tmp(:,m)>0 & all(Alpha_star_tmp(:,setdiff(1:M,m))==0,2));
                    cov_pq=sum(A_tmp(idx_alpha_star_tmp,1).*A_tmp(idx_alpha_star_tmp,2));

                    % xii) Update first order Sobol index for data output dimension
                    %    t 
                    S_i_tmp = S_i_tmp + 2*cov_pq/Var_Y(t,1)*T(t,p)*T(t,q);
                end
                
            end
            
        end
        
        % xiii) Assign first order sobol index for data output dimension t 
        %    according to equation 18 in [2]
        S_i(t,m) = S_i_tmp;
        
    end
end

%% 6) Debugg
color=linspecer_ed(3);
figure;
hold on
for j=1:M
    plot(S_tot(:,j),'Color',color(j,:))
    plot(S_i(:,j),'--','Color',color(j,:))
end

figure;
hold on
for j=1:M
    plot(S_tot(:,j)-S_i(:,j)+sum(S_i,2),'Color',color(j,:))
end

end


References

[1] P. R. Wagner, R. Fahrni, M. Klippel, A. Frangi, and B. Sudret,
“Bayesian calibration and sensitivity analysis of heat transfer
models for fire insulation panels,” Eng. Struct., vol. 205,
p. 110063, Feb. 2020, doi: 10.1016/j.engstruct.2019.110063.

[2] J. B. Nagel, J. Rieckermann, and B. Sudret, “Principal
component analysis and sparse polynomial chaos expansions for
global sensitivity analysis and model calibration: Application
to urban drainage simulation,” Reliab. Eng. Syst. Saf.,
vol. 195, p. 106737, Mar. 2020, doi: 10.1016/j.ress.2019.106737.

Have you fix your code for PCA-PCE sobol indices? I am also interested in this.

Because I am also reading Nagel’s paper recently, which is a good paper. if you have succeed in solving this and dont mind sharing the code, I really would like to have a read at your recent code on this.

Dear Ningxin

Unfortunately, I did not get any reply from either the community or the developers. For my application in [3], I only needed the total Sobol indices. In the supplementary materials of [3], we prove that a scaled PCA approach results in the same equation for the total Sobol indices as for the unscaled PCA featured by Wagner in eq. 34 [1]. Hence, I’m fairly confident that the code for the total Sobol indices is correct. Based on these thoughts, I have to assume that something is wrong with the code for the first order Sobol indices.

Please let me know, if you find some bugs in my code or in general an explanation for the observed violation of the variance decomposition conservation.

Kind regards,
David

[3] D. Breitenmoser, F. Cerutti, G. Butterweck, M. M. Kasprzak, and S. Mayer, “Emulator-based Bayesian inference on non-proportional scintillation models by compton-edge probing,” Nat. Commun., vol. 14, p. 7790, Nov. 2023, doi: 10.1038/s41467-023-42574-y.