Sufficient number of simulations using metamodels

HZhang · January 4, 2021, 5:01pm

Hi everyone,

I have trained two metamodels to estimate the outputs of two systems. My goal is to find the correlation coefficient of the two outputs Y1 and Y2. I was wondering if there is a standard way to determine the sufficient number of simulations carried out using the metamodels in order to have a accurate estimation of the target statistic.

Thanks a lot in advance!

Best,

Hongzhou

damarginal · January 16, 2021, 3:19pm

Hi @HZhang,

I assume that the target statistics you are interested in is the correlation coefficient and that there is a reason why the outputs of the two metamodels would be correlated with each other (e.g., they share the same inputs). I think what we can do in this situation is to compute the standard error of the correlation coefficient estimate for a given sample size to have an idea whether the sample size is large enough.

There is perhaps a formula for the standard error of the correlation coefficient estimate but then we have to be sure of the underlying assumptions. What I would do instead is to derive the standard error empirically by simulations either via direct replications or bootstrapping.

With direct replications: given a fixed sample size,

generate (i.e., replicate) independent pairs of Y1 and Y2 multiple times (hundreds or thousands)
compute the correlation for each replication
compute the standard deviation of the correlation estimates (from the replications)

You can use this number (standard error) to assess whether the estimate is good enough given the sample size (compare it with the mean). If you deem it’s not, then increase the sample size and repeat the procedure.

With bootstrapping: given a dataset of pairs of Y1 and Y2,

sample this dataset (the pair together) again and again with replacement (say, thousand times)
compute the correlation coefficient for each replication
compute the standard deviation of the coefficients from the replications

You can use the standard error the same way as before. If Y1 and Y2 are hard (expensive) to get, then bootstrapping is a good alternative to the direct replications above.

Because you use metamodels already (and assuming that they are accurate representations of the full computational models), I assume the model evaluations are cheap enough so there is no problem doing even the direct replications.

Maybe there are other (standard) ways to directly know how large the sample size needs to be for any given problem (anyone?) but I think these empirical approaches are worth trying :). I hope this helps!

HZhang · January 18, 2021, 9:18pm

Hi @damarginal,

Thank you so much for the detailed reply! The procedures you described are very clear and helpful. I will try them.

Thanks,

Hongzhou