A sensitivity index of 0.99 suggests that the random variable is the only one affecting the output. However, it does not say anything about linearity.
Can you paste all sensitivity indices and the PCE coefficients?
To clarify if I understood your question correctly: You plot the input variable with the highest sensitivity index of 0.99 against the model output (the labels would be helpful to understand your plot). You observe an almost perfectly linear relationship between the two. Your surrogate has a LOO value close to zero.
If this is all correct, you are in luck. The model you are trying to surrogate is essentially a linear model with only one relevant input variable and you have obtained a perfect surrogate for it.
In my opinion, for sensitivity analysis, the variability of the input variable is small.
The relationship appears linear, this is only true in a small range, not essentially the whole domain of the input variable. The variability of the input variable has an impact on output variability, and the input variable is sensitive to the output .
All input variables are important in the viewpoint of principle component analysis. It is a multi-variable problem.
“The model you are trying to surrogate is essentially a linear model with only one relevant input variable”, I think this is not true.
sorry for the delay. Thank you @styfen.schaer for jumping in
I agree with Styfen about the linearity of the model. Having a look at the PCE coefficient may clear out any doubt. I expect to see the coefficient related to the very sensitive variable as the highest while all others are proximal to zero.
Another test you could perform is to plot the input data points you feed to the PCE evaluation vs. the output. Then overlap with the PCE solution points on the plot. If they match, the surrogate model is accurate. However, there are no doubts since you already have a very good LOO and validation errors.
Why would you expect a variable with a Sobol index of zero to be linearly correlated with the output?
To me, this looks as expected. Your linear model still has a relatively small error, suggesting that your true model is indeed close to linear. You can even try to fit a PCE with degree 1 and use only the input variable with the highest Sobol index. I still expect the model to be reasonably accurate.
I am afraid I have to disagree that the other variables should be constants. They are still random variables. The thing is that their coefficients are small, which makes them unimportant to the problem. Indeed, this is why their Sobol indices are small. Then, because they are insignificant to the problem, they will not significantly affect your output, independent of their scatterplots. For instance, the shape of the scatterplot can be linear, constant or anything. It does not matter because the variable is unimportant. I hope it has solved your question. If not, just let us know!