Hi @HZhang,

I assume that the target statistics you are interested in is the correlation coefficient and that there is a reason why the outputs of the two metamodels would be correlated with each other (e.g., they share the same inputs). I think what we can do in this situation is to compute the standard error of the correlation coefficient estimate for a given sample size to have an idea whether the sample size is *large* enough.

There is perhaps a formula for the standard error of the correlation coefficient estimate but then we have to be sure of the underlying assumptions. What I would do instead is to derive the standard error empirically by simulations either via direct replications or bootstrapping.

With direct replications: given a fixed sample size,

- generate (i.e., replicate) independent pairs of Y1 and Y2 multiple times (hundreds or thousands)
- compute the correlation for each replication
- compute the standard deviation of the correlation estimates (from the replications)

You can use this number (standard error) to assess whether the estimate is *good* enough given the sample size (compare it with the mean). If you deem it’s not, then increase the sample size and repeat the procedure.

With bootstrapping: given a dataset of pairs of Y1 and Y2,

- sample this dataset (the pair together) again and again
*with replacement* (say, thousand times)
- compute the correlation coefficient for each replication
- compute the standard deviation of the coefficients from the replications

You can use the standard error the same way as before. If Y1 and Y2 are hard (expensive) to get, then bootstrapping is a good alternative to the direct replications above.

Because you use metamodels already (and assuming that they are accurate representations of the full computational models), I assume the model evaluations are cheap enough so there is no problem doing even the direct replications.

Maybe there are other (standard) ways to directly know how large the sample size needs to be for any given problem (anyone?) but I think these empirical approaches are worth trying :). I hope this helps!