# Sobol with UQLink

Hello, I have a UQLink-based model where I am running a Sobol Analysis, but it doesn’t stop at the designated sample size I give it. Instead, it keeps going without any sign of stopping.

For example, when I give it 10,000 samples, it keeps going after the target has been reached. Right now it’s still going way beyond without any indication of stopping.

Has anyone else every experienced this or have any idea why it might be doing this?

Hi @Anarionz, welcome to UQWorld!

Are you using a Monte Carlo simulation to estimate the Sobol’ indices? (this is the default)

If you are, then the total number of actual model runs is not the number of samples you specified (that refers to the number of independent MC sample points); it also depends on the number of input parameters. The total number of model runs to estimate the first-order and total Sobol’ indices is N \times (M+2) where N and M are the numbers of MC sample points and input parameters, respectively.

There’s a bit more explanation here on what kind of sample is generated for that estimation:

BTW, are you monitoring the number of model runs by looking at the files created by UQLink?

Thank you for the reply @damarginal.

I have a case where I have a total of 17 independent manufacturing input parameters to an engineered system, plus 2 parameters that account for large, important trends in operating history. This gives me a total of M = 19.

Each of these parameters are to the best of my knowledge, independent of one another.

I was originally running the SOBOL analysis with N = 10000, so no wonder it was taking SO long… It actually required 210000 “realizations.”

So after reading your feedback, I’ve run the SOBOL analysis again, with N = 200, and N = 500 but I end up getting an error which looks as follows:

something went extremely wrong while trying to evaluate the analysis
You may find additional information in the catched exeption:
Improper assignment with rectangular empty
matrix.

Error in uq_sobol_indices (line 163)
TotalSobolIndices(oo, : ) = tsindex;

Error in uq_sensitivity (line 83)
results =
uq_sobol_indices(CurrentAnalysis);

Error in uq_initialize_uq_sensitivity (line 861)
uq_runAnalysis(current_analysis);

Error in AUTO_EAE_UQ_Improved_SLURM_v3 (line 346)
SobolAnalysis = uq_createAnalysis(SobolOpts);

Before I carry on and try a much larger size of N, could you please let me know what you think regarding whether the above error occurred simply because of the low number of N I’ve tried re-running the problem with (such that the problem didn’t converge)? Or do you think there is a different fundamental issue?

Thank you!

Hi @Anarionz,

One way this can occur is if your model evaluation somehow gives NaN as values for any input points in the experimental design used to estimate the indices. It’s a bit difficult to know more from your current description.

Before you conduct your Sobol’ analysis, have you checked that you have indeed established a proper connection between UQLab and your model with UQLink?

Say you have myModel and myInput for your UQLink-based model and your probabilistic input model, respectively. What would the following code give you?

X = uq_getSample(myInput,200);  % Use smaller sample size if your code takes too long to execute
Y = uq_evalModel(myModel,X); % a typo was fixed here


This would be the first step to indicate that.

How long does it take to run a single simulation of your model, by the way?

Hi @damarginal, I tried running

X = uq_getSample(myInput,200); % Use smaller sample size if your code takes too long to execute
Y = uq_evalSample(myModel,X);

It yields an error as follows:

Undefined function ‘uq_evalSample’ for input
arguments of type ‘uq_model’.

Error in AUTO_EAE_UQ_Improved_SLURM_v3 (line 362)
Y = uq_evalSample(AUTO_EAE_Model,X);

I was normally using the following command to evaluate that the model was computing properly:

X = uq_getSample(Sample_Size,‘LHS’) ;
Y = uq_evalModel(AUTO_EAE_Model, X);

If so, then it runs fine.

A typical single simulation takes about 10 seconds.

Oops, there was a typo on my code, it should have been indeed uq_evalModel. Sorry!

Could you perhaps plot a histogram of the resulting Y (make sure you generate a large enough sample size)? Maybe also compute sum(isnan(Y))?

For a sample size of 100000, when I try sum(isnan(Y)), none of the output variables return a NaN.

By the way, I have a total of 18 output variables that constitute Y, and out of them, only 9 are crucial, so I could definitely downsize it.

So there are essentially 19 inputs, and 18 outputs (although only 9 outputs are crucial)

Great, thanks!

Have you also checked the distribution of your outputs? Either by looking at the histograms and computing some summary statistics (variance, mean, etc.)? Sobol’ indices are based on variance decomposition; so it’s also a good idea to check if this quantity is there.

Also, it seems you were able to evaluate your 10-second model 100’000 times very quickly. Do you by any chance use a special setup?

Hi @Anarionz,

Do you still have a problem running the Sobol’ analysis with your UQLink-based model? If so, and you did the preliminary checks I recommended and they turned out to be okay, maybe we can look at your actual UQLab analysis scripts? As I understand, it would be difficult for the problem to be reproduced due to the use of external codes, but perhaps we can find a way.

Hi Damar, I have checked the distribution of my outputs, and they generally trend toward a normal or log-normal distribution at best, although Rsq values are not the greatest (around 0.85). This is largely due to one of the input parameters, which has the greatest impact on the output parameters, is hugely variable and is significantly influenced by human decisions.

I am still having trouble running the analysis, but I have yet to try a significantly larger sample size than the N = 500. Perhaps 500 sample size is way too small, so I will give a higher sample size a try, but smaller than my previous N = 100,000, because that was just gonna take forever…

I will also post the UQLab script when I get home from work.

Do you need the external program’s output reader as well?

Regards.

Here are the scripts. It says new users cannot upload, so I will copy paste them here:

## The UQLink Script:

fclose all;
clearvars;
clc;

Restart = 0;
%%= 0 is not restart, = 1 is restart
Analysis_Type = 1; % = 1 is model eval, = 2 is SOBOL analysis
Sample_Size = 1000;

%%
%% NAME DIRECTORIES AND VARIABLES, AND DELETE OLD FILES
Current_Folder=pwd;
%%find current dir
files = ls(‘AUTO_EAE_UQLab*.inp’); % (list of files with AUTO_EAE_UQLab*.inp)
for n = 1: size(files, 1)
delete(strtrim(files(n,:)));
end

if isunix
%% Unix
copyfile(strcat(Current_Folder,’/UQInput_Template/AUTO_EAE_UQLab.inp’), Current_Folder, ‘f’)
elseif ispc % Windows
copyfile(strcat(Current_Folder,’\UQInput_Template\AUTO_EAE_UQLab.inp’), Current_Folder, ‘f’)
end

if Restart == 0
%% = 0 is not restart so delete old folders
delete(‘AUTO_EAE_UQLab.zip’)
end

if ispc
Exe_File = ‘AAPU_EAE_UQ_Function_ELM2’;
%%this is the external .exe program
elseif isunix
Exe_File = ‘run_AAPU_EAE_UQ_Function_ELM2_Linux.sh’;
%% this is the external .sh program
end
%/global/software/matlab/R2017a/
Input_File = ‘AUTO_EAE_UQLab.inp’;
Output_File = ‘AUTO_EAE_UQLab.out’;

timer = tic;
%% INITIALIZE UQLAB
if isunix
%%Unix
cd /global/home/hpc4077/UQLab/core
uqlab
cd(Current_Folder);
elseif ispc % Windows
uqlab
end
%%
%%Model type:
Mopts.Type = ‘UQLink’ ;
Mopts.Name = ‘AUTO_EAE_UQLab’;

%%
%%Mandatory options:)
%%Provide the command line, i.e. a sample of the command line that will be)
%%run on the shell
if ispc
%%Windows
Command = strcat([Exe_File, ’ ‘, Input_File]);
elseif isunix
%%Unix
Deliminator = ‘"’;
Executable = [Deliminator fullfile(Current_Folder,Exe_File) Deliminator];
Command = sprintf(’%%s %%s %%s’, Executable, Input_File);
end
Mopts.Command = Command ; % load the file then run the matlab script
%%
%%Provide the template file, i.e. a copy of the original input files)
%%where the inputs of interest are replaced by markers:)
Mopts.Template = strcat(Input_File, ‘.tpl’);
%%
%%Provide the MATLAB file that is used to retrieve the quantity of interest)
%%from the code output file:)
Mopts.Output.FileName = Output_File ;
if isunix
%%Unix
Mopts.Output.Parser = ‘uq_readOutput’ ;
elseif ispc % Windows
Mopts.Output.Parser = ‘uq_readOutput’ ;
end

%%
% Format of the variables written in the Abaqus input file
Mopts.Format = {’%6.5f’};

%%
% Set the display to quiet
ModelOpts.Display = ‘quiet’ ;

%%
% Create the UQLink wrapper:
AUTO_EAE_Model = uq_createModel(Mopts) ;

%% 3 - DEFINE THE PROBABILISTIC MODEL
inputopts.Marginals(1).Name = ‘Pellet_Dia’;
inputopts.Marginals(1).Type = ‘Lognormal’ ; % Lognormal 0.96073 vs Gaussian 0.96070
inputopts.Marginals(1).Moments = [12.210092 0.00328781] ;

inputopts.Marginals(2).Name = ‘Sheath_ID’;
inputopts.Marginals(2).Type = ‘Lognormal’ ; % Lognormal 0.976272368 vs Gaussian 0.976253242
inputopts.Marginals(2).Moments = [12.299666 0.003665] ;

inputopts.Marginals(3).Name = ‘Dish_Dpth’;
inputopts.Marginals(3).Type = ‘Lognormal’ ; % Lognormal 0.913672446 vs Gaussian 0.895779768
inputopts.Marginals(3).Moments = [0.2098888889 0.0146364505] ;

inputopts.Marginals(4).Name = ‘Land_Width’;
inputopts.Marginals(4).Type = ‘Lognormal’ ; % Lognormal 0.967042334 vs Gaussian 0.959241921
inputopts.Marginals(4).Moments = [0.8235563 0.0324305] ;

inputopts.Marginals(5).Name = ‘Pelt_Density’;
inputopts.Marginals(5).Type = ‘Gaussian’ ; % Lognormal 0.992068054 vs Gaussian 0.992175757
inputopts.Marginals(5).Moments = [10.6256873 0.0198114] ;

inputopts.Marginals(6).Name = ‘Stac_Lengt’;
inputopts.Marginals(6).Type = ‘Gaussian’ ; % BASED ON ASSUMPTION OF GAUSSIAN, MEAN CENTERED BETWEEN MANUFACTURER MAX AND MIN TOLERANCE
inputopts.Marginals(6).Moments = [481.42500 0.261666667] ;

inputopts.Marginals(7).Name = ‘Sheat_Lengt’;
inputopts.Marginals(7).Type = ‘Gaussian’ ; % BASED ON ASSUMPTION OF GAUSSIAN, MEAN CENTERED BETWEEN MANUFACTURER MAX AND MIN TOLERANCE
inputopts.Marginals(7).Moments = [486.1300 0.016666667] ;

inputopts.Marginals(8).Name = ‘Weld_Disp_A’;
inputopts.Marginals(8).Type = ‘Gaussian’ ; % Lognormal 0.965769662 vs Gaussian 0.9706230 vs Weibull 0.977885841
inputopts.Marginals(8).Moments = [0.035528 0.000677] ;

inputopts.Marginals(9).Name = ‘Weld_Disp_B’;
inputopts.Marginals(9).Type = ‘Gaussian’ ; % Lognormal 0.965769662 vs Gaussian 0.9706230 vs Weibull 0.977885841
inputopts.Marginals(9).Moments = [0.035528 0.000677] ;

inputopts.Marginals(10).Name = ‘Sheat_Thick’;
inputopts.Marginals(10).Type = ‘Lognormal’ ; % Lognormal 0.973156331 vs Gaussian 0.972522273
inputopts.Marginals(10).Moments = [0.4008627 0.0018982] ;

inputopts.Marginals(11).Name = ‘He_Fract’;
inputopts.Marginals(11).Type = ‘Lognormal’ ; % Lognormal 0.976818174 vs Gaussian 0.973329443
inputopts.Marginals(11).Moments = [0.852541 0.024087] ;

inputopts.Marginals(12).Name = ‘Sheat_Rough’;
inputopts.Marginals(12).Type = ‘Gaussian’ ; % BASED ON ASSUMPTION OF GAUSSIAN, MEAN CENTERED BETWEEN PATENT LITERATURE CONTROL LIMIT OF 0.5 AND 0.3 AND 3 STDEV BETWEEN LIMITS AND MEAN
inputopts.Marginals(12).Moments = [0.4 0.0333] ;

inputopts.Marginals(13).Name = ‘Pell_Rough’;
inputopts.Marginals(13).Type = ‘Gaussian’ ; % BASED ON ASSUMPTION OF GAUSSIAN, MEAN AND STDEV DERIVED FROM LITERATURE
inputopts.Marginals(13).Moments = [0.776666667 0.130000000] ;

inputopts.Marginals(14).Name = ‘Pelt_Grain_Sz’;
inputopts.Marginals(14).Type = ‘Lognormal’ ; % Lognormal 0.968824216 vs Gaussian 0.952122186
inputopts.Marginals(14).Moments = [8.2256354 1.2219068] ;

inputopts.Marginals(15).Name = ‘Chamf_Dept’;
inputopts.Marginals(15).Type = ‘Gaussian’ ;
%% BASED ON ASSUMPTION OF GAUSSIAN, and mean and std
inputopts.Marginals(15).Moments = [0.825 0.1] ;
inputopts.Marginals(15).Bounds = [0, 1.00];
%%Arbitary Limit decided by me)

inputopts.Marginals(16).Name = ‘Chamf_Leng’; % axial chamfer
inputopts.Marginals(16).Type = ‘Gaussian’ ;
%%BASED ON ASSUMPTION OF GAUSSIAN, and mean and std
inputopts.Marginals(16).Moments = [0.183 0.1] ;
inputopts.Marginals(16).Bounds = [0, 0.36]; % if greater than 0.36 the ELESTRES runs keep failing

inputopts.Marginals(17).Type = ‘Gaussian’ ; % Lognormal 0.996343578 vs Gaussian 0.99757754
inputopts.Marginals(17).Moments = [485.2622625 25.48472693] ;

inputopts.Marginals(18).Name = ‘Zone’;
inputopts.Marginals(18).Type = ‘Gaussian’ ; % Gaussian 0.95317
inputopts.Marginals(18).Moments = [5.1619 2.7480] ;
inputopts.Marginals(18).Bounds = [1, 12];

inputopts.Marginals(19).Name = ‘Bundle_Start_Spot’;
inputopts.Marginals(19).Type = ‘Uniform’ ;
inputopts.Marginals(19).Moments = [((8+1)/2) ((8-1)/(2*sqrt(3)))] ; % for Zone 6 … RSQ = 0.87
inputopts.Marginals(19).Bounds = [1, 8];

% Create the INPUT object
myInput = uq_createInput(inputopts);
uq_print(myInput)

%% 4 - VARIOUS APPLICATIONS

if Analysis_Type == 1; % = 1 is model eval,
%% 4.1 Estimation of the response PDF using Monte Carlo simulation
% Compute an experimental design (ED) of size 250 using Latin Hypercube Sampling:
X = uq_getSample(Sample_Size,‘LHS’) ;

%%
% Evaluate the Abaqus model (truss tip deflection) on the ED:
Y = uq_evalModel(AUTO_EAE_Model, X);

save(strcat(‘run_sample_size_’,num2str(Sample_Size)));

elseif Analysis_Type == 2; % = 2 is SOBOL analysis

SobolOpts.Type = ‘Sensitivity’;
SobolOpts.Method = ‘Sobol’;
SobolOpts.Sobol.Order = 1;
SobolOpts.Sobol.SampleSize = Sample_Size;
SobolAnalysis = uq_createAnalysis(SobolOpts);
uq_print(SobolAnalysis)

SobolResults = SobolAnalysis.Results;

save(strcat(‘SOBOL_sample_size_’,num2str(Sample_Size)));

elseif Analysis_Type == 3; % = 2 is PCE Sobol analysis

PCEOpts.Type = ‘Metamodel’;
PCEOpts.MetaType = ‘PCE’;
PCEOpts.FullModel = AUTO_EAE_Model;
PCEOpts.Degree = 7;
PCEOpts.ExpDesign.NSamples = 20000;
myPCE = uq_createModel(PCEOpts);

SobolOpts.Type = ‘Sensitivity’;
SobolOpts.Method = ‘Sobol’;
SobolOpts.Sobol.Order = 1;
SobolOpts.Sobol.SampleSize = 100000;

mySobolAnalysisPCE = uq_createAnalysis(SobolOpts);
uq_print(mySobolAnalysisPCE)

mySobolResultsPCE = mySobolAnalysisPCE.Results;

save(strcat(‘PCE_SOBOL_sample_size_’,num2str(SobolOpts.Sobol.SampleSize)));

end

time = toc(timer)

The Parser Script is simple:

function Y = uq_readOutput(outputfile)
%%Read t h e s i n g l e l i n e o f t h e f i l e , wh ich c o r r e s p o n d s t o t h e)
%%s o u g h t m idspan beam d e f l e c t i o n)
Y = dlmread(outputfile) ;
end

FYI, wherever there is a % sign, I put %% instead, because it was inadvertently triggering some math symbolism.

Thanks!

Hi @Anarionz,

Happy new year!

Thanks for the code. I’ve had a look at it and I see that you can run your analysis either in a Linux or a Windows computer. I assume that for Analysis_Type = 1, running the script works fine on both Linux and Windows and that for Analysis_Type = 2 (also on both Linux and Windows), it throws out the same error as you reported for small sample size. Do I assume rightly?

I also see that you have a section on the applications for PCE-based Sobol’ indices. This is indeed an alternative approach to obtain Sobol’ indices with a potentially lower computational cost. Note that you don’t need to specify the SampleSize because PCE-based Sobol’ indices are obtained by post-processing the coefficients of the expansion. Specifying the SampleSize should not throw an error in this case, but just note that the Sobol’ indices are not obtained by Monte Carlo simulation of 100’000 sample points as indicated, but from PCE constructed from 20’000 sample points (see Section 1.5.3.4 of the Sensitivity Manual for detail).

Have you tried running the script with Analysis_Type = 3? If so, do the results look reasonable to you? Maybe you can lower that sample size to make a test.

A couple of additional comments (though I don’t think any of these can explain the error):

1. I think your code to do the cleanup at the beginning will only work on a Windows machine (it’ll also depend on how many AUTO_EAE_UQLAB*.inp files there are):

files = ls('AUTO_EAE_UQLab*.inp'); % (list of files with AUTO_EAE_UQLab*.inp)
for n = 1:size(files,1)
delete(strtrim(files(n,:)));
end


This is because in Linux, the command ls will give you a character vector of filenames separated by tab and space characters, while in Windows the vector will be one filename per line (See here for detail). Because there might be multiple filenames per line from ls in Linux, looping over multiple filenames as it is currently done, won’t work. It should not throw an error, but you don’t get the desired results. You might want to consider using dir instead.

2. Command = sprintf(’%%s %%s %%s’, Executable, Input_File); Do you need the third %s?

3. About the marginals on the input parameter no. 19; if you want a uniform marginal, the most straightforward way is to specify the bounds of the marginal as the parameters, i.e., inputopts.Marginals(19).Parameters = [1 8];. Note that Bounds is not a valid option for Uniform type marginals.

4. If you really want things to be 'quiet' for the model object, then instead of ModelOpts.Display = 'quiet'; you should change the variable name of the options to Mopts.Display = 'quiet'; to conform with your options variable name.

PS:
Indeed as a new member you won’t be able to attach a file, you need to become a basic member to do that. You need:

• Entering at least 5 topics
• Reading at least 30 posts
• Spend a total of 10 minutes reading posts

looking at your stats you are really close! You just need to enter 1 more topic and browse another 4 posts (post are replies to a topic)

Thank you for the reply @damarginal, hope you had a happy holidays!

I’ve looked at your recommendations and they all make sense. I have implemented them, and although I am not sure if it would resolve the issues I am facing with respect to Sobol analysis, I will try re-running the analysis.

During the break, I did try running Sobol analysis with N = 2000, but it still resulted in “something went extremely wrong” error. I am in the process of trying N = 4000. Do you think it is indeed an issue of convergence?

Also, for trying the PCE Sobol, I did try running it once, but I can’t remember exactly what happened… so I will try re-running it. Regarding this, if I understand correctly, when you say I don’t need to specify the SampleSize, do you mean ‘PCEOpts.ExpDesign.NSamples = 20000’ or do you mean ‘SobolOpts.Sobol.SampleSize = 100000’ ? I am guessing you mean I don’t have to specify the Sobol Sample size, correct?

Once again, thank you so much for helping me troubleshoot this! I hope we figure it out soon.

Yes, correct. If you use a PCE metamodel as a model in a Sobol’ analysis, then by default, the computation of the indices will be based on the post-processing of the coefficients of the PCE; so you don’t need to specify the SobolOpts.Sobol.SampleSize in this case.

You can still use MC-based Sobol’ indices with the PCE metamodel if you want (perhaps for comparison) by specifying SobolOpts.Sobol.PCEBased = false; (by default, it’s true). The PCE metamodel will then be used in lieu of the original model to compute the indices by Monte Carlo simulation. In this case, the SobolOpts.Sobol.SampleSize will be used (by default, it’s 100’000 sample points).

I don’t think it is a convergence issue. By the way, when running your analysis script, you can have access to MATLAB debugger, right?

I haven’t tried using debugger, but I don’t see why not. It appears to be available.

One quick way to diagnose the problem is directly check what happens in the suspect line.

I assume you run the script in the MATLAB desktop application. Before running the script, you can either put a break point on line 163 of ‘uq_sobol_indices.m’ or type the following in command window[1]:

dbstop in uq_sobol_indices at 163


When MATLAB enters the debug mode (indicated by K>> prompt in the command window), you can take a look at these variables:

• tsindex
• M_Sample1
• M_Sample2
• M_Pick2Freeze1

Check if there’s a NaN in any of these variables. For the last three variables, you may also check how they are distributed and see if they make sense. Note that the columns of these three variables correspond to the outputs of your model (if you have 19 outputs there should be 19 columns). You can quit the debugging session with dbquit.

Finally, I would recommend to scale up your problem gradually: start with smaller sample size, smaller set of input variables (not 19, but say, 2) and output variables. If things are working fine with scaled-down problem, then you can increase the size bit by bit and see when the error is thrown.

1. If you type this, don’t forget to type dbclear all after you finish debugging. ↩︎

Thanks @damarginal I will give this a try with a smaller sample size with Sobol!

I’m still waiting for the PCESobol run to finish also.

I was able to check the

• tsindex
• M_Sample1
• M_Sample2
• M_Pick2Freeze1

There were no NaN in any of the variables

Also, their values do seem to make sense. Nothing unusual. Although I didn’t check their statistical distributions.

I currently have a total of 25 outputs being yielded (maybe there are too many outputs perhaps?) which is what is being yielded by the model + MATLAB post-processor for the model.

What I should add is that about half of the 25 outputs (outputs 12-25) are actually re-interpretations of the inputs that are processed by the MATLAB post-processor for the model. Basically, the input variables 18 and 19 is used to randomly choose a set of material properties for the model, and these material properties are printed to be kept as history so that I can go back and look at them as required. The outputs 12-25 are the material properties, so they aren’t exactly “outputs,” however, they do depend on the inputs 18-19.

The real outputs of the model is outputs 1-11, and among them, only 8 outputs are of core interest.

Do you think the problem could be caused by there being too many outputs?

I will try downscaling the sample size and the outputs first.

Thanks!

Hey @damarginal whaddaya know…

I gave it a try with only 9 outputs, and N = 500, and it worked!

I will try with a larger sample, but even with a N=500, the indices do look reasonable.
However, I will need to think about it a bit more, because the total sum of the indices are greater than 1 (although this was expected, because there are 2 inputs that are kind-of related to one another).

Thanks!

Hi @Anarionz,

That’s great news!

Before, I’ve had the impression that your problem throws the error on line 163 of ‘uq_sobol_indices’ consistently, so I’m not sure why you can’t reproduce the error before. It is a good practice to set up the seed for the random number at the beginning before a random simulation so you can reproduce the results exactly.

Note that the suspect line 163 is inside a loop (of all the outputs). The loop computes the Sobol’ indices for each output you set. So checking for one iteration (i.e., one model output) might not be enough you need to check for each; you need to let the loop run for another iteration and another until the error, if it’s there, is thrown. I suspect that the error is thrown for one or more outputs (and most probably based on your recent finding, not the first 9).

I don’t think the problem is because you have too many outputs, but perhaps one or more of those outputs are “troublesome” for the Sobol’ indices computation.

These re-interpretations of the inputs are basically functions, no? Are these functions black box or you know how the inputs are being re-interpreted?

If you have any questions about an interpretation of the Sobol’ indices which depends on the problem setup, I would suggest you open a new topic .