Hi @Soraya,
Thanks for sharing the data. I think there are a few issues:
- Among these 200 training points and 1000 validation points, many duplicates of the input values can be found. More precisely, there are only 60 distinct points in the training set and 64 distinct points in the validation set, e.g., the input value (2.78\times 10^{9},0.342,1470) is repeated 23 times. However, the associated output values are NOT the same: among the 23 model runs, the output values vary between (80.29,85.18) with variance being 1.82 close to the overall variance of 2.26 of the output Y. Is your simulation stochastic, meaning that you have additional randomness in the model (please read this post)? If so, we CANNOT predict the exact output value for a given input without getting access to the latent variables. If the simulator is not stochastic, meaning that a given set of inputs should have a unique corresponding output value, please check and correct the computational model (besides, please do not round the values).
- The experimental design was created on a regular grid, which does not reflect the input distribution. To increase the accuracy of the surrogate, I would strongly recommend using a more advanced strategy such as Latin hypercube sampling by
X = uq_getSample(myInput, 200, 'LHS');
.
Best,
Xujia
P.S. If you do not mind, please share your dataset on UQWorld so that other experienced users could also share some ideas here.