Bayesian updating: Sequential vs. bulk arrival of data

paulremo · October 30, 2019, 9:09am

Continuing the discussion from Multiple data groups with custom log-likelihood function:

Multiple data groups with custom log-likelihood function

I have a query which is more related to Bayesian Updating than UQLab. As per my understanding, under the i.i.d. assumption, the results from updating with a single LL function should be the same as that obtained by updating sequentially with two different LL functions. With the same priorscale and number of steps, in case of a single LL function my posterior distribution is \ln R_i''\sim\mathcal{N}(0.933, 0.0187) whereas by updating sequentially I get, \ln R_i''\sim\mathcal{N}(0.918, 0.0291). The prior assumed was \ln R_i'\sim\mathcal{N}(0.911, 0.099). I was wondering whether I can attribute this difference in estimation to the numerical approximations and methods? The difference in the standard deviations is slightly high it seems. In both cases I get an acceptance rate between 0.4-0.5.

In Bayesian updating, everything boils down to evaluating Bayes’ theorem written in its unnormalized form as

\pi(x\vert\mathcal{Y}) \propto \pi(x)\mathcal{L}(x;\mathcal{Y})

with the posterior distribution \pi(x\vert\mathcal{Y}), the prior distribution \pi(x) and the likelihood function \mathcal{L}(x;\mathcal{Y}). The data set is denoted as \mathcal{Y}=\{y_1,\dots,y_N\}. This corresponds to updating in bulk because all data enters into the likelihood function at once.

Opposed to this you can imagine a scenario where the data arrives sequentially and you need to update the distributions multiple times. For simplicity, assume we do two such updating steps by splitting the data set into two disjoint sets \mathcal{Y}_1 \cup \mathcal{Y}_2 = \mathcal{Y}. You can then update the initial prior distribution in two steps by

\pi(x\vert\mathcal{Y}_1) \propto \pi(x)\mathcal{L}(x;\mathcal{Y}_1)\\ \pi(x\vert\mathcal{Y}_1,\mathcal{Y}_2) \propto \pi(x\vert\mathcal{Y}_1)\mathcal{L}(x;\mathcal{Y}_2)

where you need to be careful to use the posterior of the first updating step as the prior of the second updating step. If this is the case and the data were collected independently, it is clear that the last equation is identical to the bulk updating equation.

So there should be no difference between updating sequentially and in bulk if the conditions mentioned are met. Is this the case?

Shihab_Khan · October 31, 2019, 12:16am

Hi Paul

Thanks a lot for posing my question generically.

Yes this is indeed the case.

Thanks a lot for your reply.