uq_postProcessInversion: Problem, if burnIn is an integer value

olaf.klein · April 19, 2021, 4:58pm

Dear @paulremo, dear all,

I think I have found an inconsistency between the manual for Bayesian inversion and the code in uq_postProcessInversion.m if one is calling this function with an integer value for the burnIn parameter: Denoting this value by burnInIntValue it holds that:

According to the documentation, page 57, burnInIntValue sample should be removed (i.e. not copied to the contents of Results.PostProc) as burn in.
In the lines 346–349 of uq_postProcessInversion.m it holds that the execution is stopped, if burnInIntValue is equal or greater then the size of the Samples vector, and afterwards, by the code line Sample = Sample(burnIn:end,:,:); it ensured that burnInIntValue-1 and not burnInIntValue samples are removed as burnIn.

Hence, one is not able to (miss)use uq_postProcessInversion.m to remove all but the last set of samples for the MCMC AIES sampler and to generate afterwards a scatter plots for point in this set by calling uq_display. I would like to do this since I follow the suggestion in the solution to Restarting a Bayesian Inversion and use these sample points as starting vector for further Bayesian inversions using the MCMC AIES sampler.
To avoid modification that change results of existing running code I suggest to replace in the documentation the text for integer values for burnIn from ``Specifies the number of sample points that are discarded as burn-in.’’ to something like ```Specifies the index of the the first sample points that is not discarded as burn-in, i.e. the number of discarded samples is equal to the given value minus 1. ‘’ I wanted to suggest to replace in the code if burnIn >= size(Sample,1) in line 345 the >= by >., but this does not produce the desired result.

Greetings
Olaf

olaf.klein · April 20, 2021, 7:21am

Hello,

an update to my above post: In contrast to my above assertion I can get the desired plot by replacing >= by > in the line 345 of uq_postProcessInversion.m and calling this function with the value for 'burnIn' being the number of MCMC steps. After I do this it holds that that in the myBayesianAnalysis.Results.PostProc.PostSample there are only the samples of the final step of the MCMC AIES sampler computation. And by calling now uq_display with parameter myBayesianAnalysis I can get the scatter plot showing this samples before the function stops during the plot of the posterior density with the following error message.

Error using ksdensity (line 212)
X must be a non-empty vector or two-column matrix.

Error in uq_violinplot (line 118)
    [f,xi] = ksdensity(currRuns);

Error in uq_display_uq_inversion>plotSeriesPred (line 574)
        postPlot = uq_violinplot(postRuns, 'FaceColor', postColor);

Error in uq_display_uq_inversion (line 359)
                plotSeriesPred(Sample(ii),DataCurr,plotType,pointEstimatePred)

Error in uq_module/Display

Error in uq_display (line 4)
%    uq_createModel, uq_createInput or uq_createAnalysis commands. The

For me this okay, since I got the plot i wanted, see Last-MCMC-AIES-samples
but I do not know, if anyone else would like this kind of behavior.

Greetings
Olaf

paulremo · April 20, 2021, 10:13am

Dear @olaf.klein

Thanks for noticing this. Indeed, I would opt for modifying the code to be consistent with the manual, because the burn-in is widely understood to be the “bad part” of the MCMC simulation that we want to get rid of.

I propose the following minor changes to the live version of the uq_postProcessInversion function to make this work (first line original, second line changed code):

% line 349:
Sample = Sample(burnIn:end,:,:);
Sample = Sample(burnIn+1:end,:,:);
% line 352:
LogLikeliEval = LogLikeliEval(burnIn:end,:);
LogLikeliEval = LogLikeliEval(burnIn+1:end,:);
% line 357:
ForwardModelRuns(mm).evaluation(burnIn:end,:,:);
ForwardModelRuns(mm).evaluation(burnIn+1:end,:,:);

olaf.klein · April 20, 2021, 11:37am

Dear @paulremo

I see the advantages of your suggestion and do not know, if there is any existing UQLab-Code that would be influenced by your suggestions.

Start of outdated paragraph
It seems that your live version of uq_postProcessInversion.m is already slightly changed with respect to the one in the official UQ-Lab distribution. Assuming that the line 345, 368, 371, 377 correspond to the lines 330, 349, 352, 357 I think that the last three changes should do the job and the the change for the line 345 aka 330 is not needed. In the current implementation of burnIn, the number of reaming samples is equal to (nIter-burnIn+1)*nChains, i.e. the check is overprotective in the current version of uq_postProcessInversion.m. With your changes the number is just equal to (nIter-burnIn)*nChains, such that your planned change in line 345 (now 330) is in my opinion not
needed, especially since in line 345+4, i.e. now 334, one has nPostPredSamples = (nIter-burnIn)*nChains;
End of outdated paragraph

I suggest to add the following correction, using your notion for changes:

% line 368 -2? , now line 347:  
        error('Burn in larger than number of sample points per chain')
        error('Burn in larger than or equal to number of sample points per chain')

paulremo · April 20, 2021, 12:03pm

Yes, your are right, I updated the post.

olaf.klein · April 21, 2021, 7:18am

Dear @paulremo,

I think that your modification of the code should be completed by

With your update, users are able to remove all but the last sample in the chain(s) by using BayesOpts.Solver.MCMC.Steps -1 as value for' burnin'. Since this may case problems in the sequel, similar to the one shown in my post above, but allows also to derive desired results, I suggest to add a warning, e.g.

% insertion between line 347 and  line 348
 elseif burnIn  == size(Sample,1)-1
       warning('You are removing all but the last sample in the chain(s) as burn in.');
       warining(' There are some (UQLab-)functions that can not properly deal with the result of this operation.')

In view of your update, it holds that the value for 'burnIn'’ should be smaller then BayesOpts.Solver.MCMC.Steps . Therefore, you should point out in Table 19 that 'burnIn'’ should be an `Integer between 1 and T-1’

Greetings
Olaf
P.S.: Since I already have one suggestion for Section 3.3 of the manual, I suggest to add at the end of this section a reference to the documentation of the contents of myBayesianAnalysis.Results.PostProc on page 38.

olaf.klein · May 19, 2021, 4:42pm

Dear @paulremo,

I think that I found a further small error in the documentation for the Bayesian inversion:

According to the table 10, the default value for NChains should be 100. This seems to be the case for the default sampler, i.e. the Affine invariant ensemble sampler that is acticated if Bayeopt.MCMC.Sampler is not defined or if Bayeopt.MCMC.Sampler ='AIES' , but the used default value seems to depend on the considered solver and this should be pointed out in Table 10.
I have switching to the Metropolis-Hastings algorithm by using Bayeopt.MC.Sampler ='MH' and have not providing any value for Bayeopt.MCMC.Sampler.NChains. The corresponding the result myBayesianAnalysis of uq_createAnalysis contained in my case in myBayesianAnalysis.Results.Sample an 5000 \times 4 \times 10 array, i.e. it seems that the default value for NChains is 10,if the Metropolis-Hastings algorithm is used, which seem to be a reasonable value.

Greetings
Olaf
P.S. Sorry for somehow misplacing this bug report into an other conversation, but I have already created 4 of the 5 Features Requests-topics created in 2021 until today and I think that this remark is not important enough to create a further one.

paulremo · May 20, 2021, 10:34am

Hi @olaf.klein

You are absolutely right, thank you for pointing it out. This is missing in the manual and will be fixed in its next version.

paulremo · October 4, 2021, 12:16pm

This has been fixed for the upcoming next version of the inversion manual. I will close this topic now.