Bayesian updating: Sequential vs. bulk arrival of data

Continuing the discussion from Multiple data groups with custom log-likelihood function:

In Bayesian updating, everything boils down to evaluating Bayes’ theorem written in its unnormalized form as

\pi(x\vert\mathcal{Y}) \propto \pi(x)\mathcal{L}(x;\mathcal{Y})

with the posterior distribution \pi(x\vert\mathcal{Y}), the prior distribution \pi(x) and the likelihood function \mathcal{L}(x;\mathcal{Y}). The data set is denoted as \mathcal{Y}=\{y_1,\dots,y_N\}. This corresponds to updating in bulk because all data enters into the likelihood function at once.

Opposed to this you can imagine a scenario where the data arrives sequentially and you need to update the distributions multiple times. For simplicity, assume we do two such updating steps by splitting the data set into two disjoint sets \mathcal{Y}_1 \cup \mathcal{Y}_2 = \mathcal{Y}. You can then update the initial prior distribution in two steps by

\pi(x\vert\mathcal{Y}_1) \propto \pi(x)\mathcal{L}(x;\mathcal{Y}_1)\\ \pi(x\vert\mathcal{Y}_1,\mathcal{Y}_2) \propto \pi(x\vert\mathcal{Y}_1)\mathcal{L}(x;\mathcal{Y}_2)

where you need to be careful to use the posterior of the first updating step as the prior of the second updating step. If this is the case and the data were collected independently, it is clear that the last equation is identical to the bulk updating equation.

So there should be no difference between updating sequentially and in bulk if the conditions mentioned are met. Is this the case?

1 Like

Hi Paul

Thanks a lot for posing my question generically.

Yes this is indeed the case.

Thanks a lot for your reply.