Multiple data groups with custom log-likelihood function

Shihab_Khan · October 28, 2019, 12:55am

Hi UQWorld

I’m trying to do a problem in which I have two data groups with different discrepancies. The first data group has a known gaussian discrepancy while the second data group requires a custom log-likelihood function.

Does UQLab support Multiple data groups with custom log-likelihood function? I noticed that the file uq_initialize_uq_inversion throws an error when more than 1 data groups are provided:

‘Multiple data groups not supported for custom logLikelihood’

If this is indeed the case, could there be a workaround this issue? In my problem, the observations are assumed to be i.i.d.

Would it be feasible to provide the two data groups as a single data group and then based on the number of observations of each type, write the custom likelihood function to evaluate them accordingly?

Shihab_Khan · October 29, 2019, 4:52am

I’m putting forward a working example.

Problem

The instantaneous resistance of a component is given by \ln R_i = \ln (R_0) + \ln (1-kt^\nu), where \ln R_0 is a Gaussian random variable and the term \ln (1-kt^\nu) is a scalar depending upon degradation parameters k and \nu, and time t.

Therefore, the model \mathcal{M}(\mathbf{X}) is \ln R_i with \mathbf{X}\in \mathbb{R}^1. There are two kinds of measurements. Firstly, we have SHM A which gives a continuous measurement, \ln X_m = \mathcal{M}(\mathbf{X}) + \epsilon_A, where \epsilon_A \sim \mathcal{N}(0,\sigma_A). Secondly, we have SHM B measurements, y = (\mathcal{M}(\mathbf{X}) + \epsilon_B) \in [a_m, a_{m+1}], where \epsilon_B \sim \mathcal{N}(0,\sigma_B). As can be deduced, SHM B would require a custom log-likelihood function.

Solution
The way I’m trying to solve this should be apparent from the code (files attached). Instead of creating two different data groups, I’m using a single data group with a custom log-likelihood function in such a way that the measurements for the particular SHM type are to its particular log-likelihood function.

However, my results don’t seem to converge. I’ve tried different algorithms with different scales and steps. Am I missing out on something?

The main file is test_UQ_shmAB.m. Any help would be appreciated.

test_UQ_shmAB.m (1.9 KB) log_resistance_model.m (106 Bytes) LL_SHM_B.m (884 Bytes) LL_SHM_A.m (393 Bytes)

paulremo · October 29, 2019, 5:27pm

Hi Shihab

To briefly answer your first question: No, it is not possible to supply multiple data groups with user-define likelihood functions. If you handle and work with advanced discrepancy models, it is expected that you know how to implement the simple additive discrepancy model currently implemented in UQLab.

The notion of data groups can be simply addressed by providing a user-defined likelihood function that handles the provided data and intended discrepancy model.

Regarding your second question, I found two problems with your code:

The user-defined likelihood function should be passed as the log-likelihood function, so I am pretty sure you should change the product in line 31 to a sum.
The convergence problem seems to stem from -Inf being returned frequently by your user-defined likelihood function. There could be multiple reasons for this, but it seems that you are not exploiting the advantages of a likelihood formulation in the log space properly. The reason the likelihood is typically implemented in the log space is to avoid 0 output of the likelihood function that can come from underflow issues. It looks like you are just transforming the likelihood output to the log-space after evaluating everything in the standard space, which defeats the purpose.
What is the purpose of line 30 of LL_SHM_B? It seems like there is a lonely y(j); there.

Let me know if any of these solves your issue.

Just a sidenote: Consider vectorizing your custom likelihood function and avoid loops as much as possible to take advantage of MATLAB’s vectorization optimization. This could significantly speed up the MCMC sampler.

Shihab_Khan · October 30, 2019, 5:07am

Hi Paul

Thanks a lot for your reply.

As for your first reply, I can understand why this functionality hasn’t been implemented in UQLab. Thanks for the answer.

As for your replies to the second question, firstly, thanks a lot for pointing out the mistake in line 31. It explains why I was getting reasonable results while updating sequentially but not when I was trying to update with a single custom log-likelihood function.

Secondly, I thought it wouldn’t make too much of a difference as I wasn’t trying to maximize the likelihood here. But I can see your point and I’ll make the change in the log-likelihood function too.

Thirdly, that lonely y(j) was there by mistake. I had put it while doing some debugging.

Fourthly, I’ll definitely vectorize the LL code. Thanks a lot for the suggestion.

Some further points and queries
By just making the correction in line 31 to sum, I’ve started getting reasonable results. No more -Inf in the LL outputs.

I have a query which is more related to Bayesian Updating than UQLab. As per my understanding, under the i.i.d. assumption, the results from updating with a single LL function should be the same as that obtained by updating sequentially with two different LL functions. With the same priorscale and number of steps, in case of a single LL function my posterior distribution is \ln R_i''\sim\mathcal{N}(0.933, 0.0187) whereas by updating sequentially I get, \ln R_i''\sim\mathcal{N}(0.918, 0.0291). The prior assumed was \ln R_i'\sim\mathcal{N}(0.911, 0.099). I was wondering whether I can attribute this difference in estimation to the numerical approximations and methods? The difference in the standard deviations is slightly high it seems. In both cases I get an acceptance rate between 0.4-0.5.