Dear Alexander,
Thanks for posting this very interesting topic. I hope it is not too late to share some ideas here.
I agree with you that the “information gain” (or formally the mutual information) has similar behaviors to the Sobol’ indices. Moreover, I think we can group them into a more general framework.
Let’s start with the classical UQ setup: we have input random variables \boldsymbol{X} \sim P_{\boldsymbol{X}} of dimension M and a deterministic simulator expressed as
Y = {\mathcal{M}}(\boldsymbol{X}).
For the following analysis, we split the input vector into two subsets \boldsymbol{X} = (\boldsymbol{X}_{{\boldsymbol{\rm u}}},\boldsymbol{X}_{{\boldsymbol{\rm v}}}), where {\boldsymbol{\rm v}} = \{1,\ldots,M\} \setminus {\boldsymbol{\rm u}}. Under this decomposition, we define an auxiliary random variable
Y_{\boldsymbol{\rm u}}(\boldsymbol{x}_{{\boldsymbol{\rm u}}}) = {\mathcal{M}}(\boldsymbol{x}_{\boldsymbol{\rm u}},\boldsymbol{X}_{\boldsymbol{\rm v}}),
which is equivalent to Y_{{\boldsymbol{\rm u}}}(\boldsymbol{x}_{{\boldsymbol{\rm u}}}) = Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{x}_{\rm u}.
Consider an arbitrary contrast measure d(Y_1,Y_2) that measures how a random variable Y_2 is different from a given random variables Y_1. As a result, d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{x}_{\boldsymbol{\rm u}})\right) indicates how the knowledge of \boldsymbol{X}_{{\boldsymbol{\rm u}}} = \boldsymbol{x}_{{\boldsymbol{\rm u}}} can make a difference. Then, we can define the sensitivity index S_{\boldsymbol{\rm u}} by
S_{\boldsymbol{\rm u}} = \mathbb{E}_{\boldsymbol{X}_{{\boldsymbol{\rm u}}}}\left[d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{X}_{{\boldsymbol{\rm u}}})\right)\right].
This index shows how knowing \boldsymbol{X}_{{\boldsymbol{\rm u}}} can change the property of Y in expectation.
Sobol’ indices, entropy-based indices (or information gain), and Borgonovo indices can all be interpreted within this framework with a proper choice of the contrast measure.
Sobol’ indices
We can define d(Y_1,Y_2) as the variance difference between the two random variables
d(Y_1,Y_2) = {\rm Var}[Y_1] - {\rm Var}[Y_2].
Then, we obtain
d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{x}_{\boldsymbol{\rm u}})\right) = {\rm Var}[Y] - {\rm Var}[Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{x}_{\boldsymbol{\rm u}}],
which shows how we can reduce the variance of Y by knowing \boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{x}_{\boldsymbol{\rm u}}. Therefore, the associated index S_{\boldsymbol{\rm u}} becomes
\begin{aligned}
S_{\boldsymbol{\rm u}} &= \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{X}_{\boldsymbol{\rm u}})\right)\right] = \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[{\rm Var}[Y] - {\rm Var}[Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}}]\right] \\
&= {\rm Var}[Y] - \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[{\rm Var}[Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}}]\right] = {\rm Var}\left[\mathbb{E}\left[Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}}\right]\right].
\end{aligned}
The last equality comes from the Law of total variance. Dividing S_{\boldsymbol{\rm u}} by the total variance of Y will provide the first order Sobol’ index of the group \boldsymbol{X}_{\boldsymbol{\rm u}}.
Entropy-based indices
We can define d(Y_1,Y_2) as the difference in terms of the differential entropy between two random variables
d(Y_1,Y_2) = h(Y_1) - h(Y_2).
As a result, we have
d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{x}_{\boldsymbol{\rm u}})\right) = h(Y) - h\left(Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{x}_{\boldsymbol{\rm u}}\right),
which quantifies how we can reduce the differential entropy of Y by knowing \boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{x}_{\boldsymbol{\rm u}}. The associated index can be further derived as
\begin{aligned}
S_{\boldsymbol{\rm u}} &= \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{X}_{\boldsymbol{\rm u}})\right)\right] = \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[h(Y) - h\left(Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}} = \boldsymbol{X}_{\boldsymbol{\rm u}}\right)\right] \\
&= h(Y) - h\left(Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}}\right) = I(Y;\boldsymbol{X}_{\boldsymbol{\rm u}}).
\end{aligned}
It is worth remarking here that I use the notation h\left(Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}}= \boldsymbol{X}_{\boldsymbol{\rm u}}\right) instead of h\left(Y \mid\boldsymbol{X}_{\boldsymbol{\rm u}} \right) in the derivation above, because the latter has a specific meaning in information theory (called conditional entropy). The last equality corresponds to the definition of mutual information I(Y;\boldsymbol{X}_{\boldsymbol{\rm u}}). The index S_{\boldsymbol{\rm u}} is consistent with the “information gain” defined in decision tree, e.g. https://www3.nd.edu/~rjohns15/cse40647.sp14/www/content/lectures/23%20-%20Decision%20Trees%202.pdf. Similar to the Sobol’ indices, we can normalize S_{\boldsymbol{\rm u}} by h(Y) to have something between [0,1].
Borgonovo indices
Consider a contrast measure
d(Y_1,Y_2) = \frac{1}{2}\lVert f_1- f_2 \rVert_1,
where f_1 and f_2 are the probability density functions of Y_1 and Y_2, respectively, and \lVert\cdot\rVert_1 denotes the L^1 norm. It is easy to derive the associated index S_{\boldsymbol{\rm u}}:
\begin{aligned}
S_{\boldsymbol{\rm u}} &= \mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[d\left(Y,Y_{\boldsymbol{\rm u}}(\boldsymbol{X}_{\boldsymbol{\rm u}})\right)\right] = \frac{1}{2}\mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[ \lVert f_Y - f_{Y\mid \boldsymbol{X}_{\boldsymbol{\rm u}}} \rVert_1\right] \\
&= \frac{1}{2}\mathbb{E}_{\boldsymbol{X}_{\boldsymbol{\rm u}}}\left[\int_{\mathcal{D}_{Y}} \lvert f_Y(y) - f_{Y\mid \boldsymbol{X}_{\boldsymbol{\rm u}}}(y)\rvert {\rm d}y\right],
\end{aligned}
which is the definition of the Borgonovo indices.
As a summary, I think that the Sobol’ indices and the information gain are defined in a similar way but with a different focus: the Sobol’ indices use variance as an uncertainty measure, whereas the information gain relies on the entropy.