Dear UQLab team,
I am trying to run a PCE based sensitivity analysis of a hydrogeophysical model (the output of the model is a time series of geophysical signals at Nt = 66 time steps).
Here is a quick description of my problem and the experimental design :
- Problem dimensionality : 22 parameters
- Experimental design : Nsamples = 9977 (split into training and validation sets of size 75/25 %)
Despite the large number of samples, I still get large errors (maximum LOO error \approx 0.17) which is too much for reliable Sobol indices computation (I have read in another post that it should be around 10^{-2}).
Do you have an idea on how I could reduce the PCE LOO error in order to make it suitable for reliable Sobol indices computation ?
Thank you in advance for your precious help.
Below is the code I am using to build the PCE, if you want to have a look.
Best regards,
Guillaume
The samples files are too large for me to upload here, but here is the code I am using to build the PCE :
#STEP2 : Retrieve simulation data
simfiles_dir = 'Res_10ksim/'
# first set of simulation results
Y_E2_1 = np.loadtxt(simfiles_dir+'GSA31_E2.csv', delimiter = ',')
X1 = np.loadtxt(simfiles_dir+'GSA31_X.csv', delimiter = ',')
# second set of simulation results
Y_E2_2 = np.loadtxt(simfiles_dir+'GSA32_E2.csv', delimiter = ',')
X2 = np.loadtxt(simfiles_dir+'GSA32_X.csv', delimiter = ',')
Y_E2_3 = np.loadtxt(simfiles_dir+'GSA33_E2.csv', delimiter = ',')
X3 = np.loadtxt(simfiles_dir+'GSA33_X.csv', delimiter = ',')
Y_E2_4 = np.loadtxt(simfiles_dir+'GSA34_E2.csv', delimiter = ',')
X4 = np.loadtxt(simfiles_dir+'GSA34_X.csv', delimiter = ',')
X = np.concatenate((X1,X2,X3,X4))
Y_E2 = np.concatenate((Y_E2_1,Y_E2_2,Y_E2_3,Y_E2_4))
nanidx = np.argwhere(np.isnan(Y_E2[:,0]))
Y_E2 = np.delete(Y_E2, nanidx, 0)
X = np.delete(X, nanidx, 0)
#split into calibration/validation set (here, size : 0.75 / 0.25)
Ncal = int(3*(int(X.shape[0]/4)))
X_cal = X[:Ncal]
Y_E2_cal = Y_E2[:Ncal]
X_val = X[Ncal:]
Y_E2_val = Y_E2[Ncal:]
print(X_cal.shape)
print(Y_E2_cal.shape)
print(X_val.shape)
print(Y_E2_val.shape)
Output :
(7482, 22)
(7482, 66)
(2495, 22)
(2495, 66)
# STEP 3 : probabilistic model input
InputOpts = {
"Marginals": [
# Layer 1
{"Type": "Uniform",
"Parameters": [0, 3] # log10Ks1
},
{"Type": "Uniform",
"Parameters": [0.01, 0.04] # tr1
},
{"Type": "Uniform",
"Parameters": [0.30, 0.50] #ts1
},
{"Type": "Uniform",
"Parameters": [0.01, 0.15] # alpha1
},
{"Type": "Uniform",
"Parameters": [1, 1.8] # n1
},
# Layer 2
{"Type": "Uniform",
"Parameters": [0, 3] # log10Ks2
},
{"Type": "Uniform",
"Parameters": [0.01, 0.04] # tr2
},
{"Type": "Uniform",
"Parameters": [0.30, 0.50] #ts2
},
{"Type": "Uniform",
"Parameters": [0.01, 0.15] # alpha2
},
{"Type": "Uniform",
"Parameters": [1, 1.8] # n2
},
# Layer 3
{"Type": "Uniform",
"Parameters": [0, 3] # log10Ks3
},
{"Type": "Uniform",
"Parameters": [0.01, 0.04] # tr3
},
{"Type": "Uniform",
"Parameters": [0.30, 0.50] #ts3
},
{"Type": "Uniform",
"Parameters": [0.01, 0.15] # alpha3
},
{"Type": "Uniform",
"Parameters": [1, 1.8] # n3
},
# Non simulated zone parameters
{"Type": "Uniform",
"Parameters": [0.01, 0.05] # tRMP1
},
{"Type": "Uniform",
"Parameters": [0, 1e-8] # tRMP2
},
{"Type": "Uniform",
"Parameters": [0, 1e-8] # tRMP3
},
# Field capacity parameter
{"Type": "Uniform",
"Parameters": [0.20, 0.30] # theta_c1
},
{"Type": "Uniform",
"Parameters": [0.20, 0.30] # theta_c2
},
{"Type": "Uniform",
"Parameters": [0.20, 0.30] # theta_c3
},
# delta_h piezo
{"Type": "Uniform",
"Parameters": [-20.0, 20.0]
},
]
}
myInput = uq.createInput(InputOpts)
MetaOpts = {
'Type': 'Metamodel',
'MetaType': 'PCE'
}
# Use experimental design loaded from the data files:
MetaOpts['ExpDesign'] = {
'X': X_cal.tolist(),
'Y': Y_E2_cal.tolist()
}
#Set the maximum polynomial degree to 5:
MetaOpts["Degree"] = np.arange(1,5).tolist()
# Specify the parameters of the PCE truncation scheme
MetaOpts['TruncOptions'] = {
'qNorm': 0.7,
'MaxInteraction': 2
}
#Provide the validation data set to get the validation error:
MetaOpts['ValidationSet'] = {
'X': X_val.tolist(),
'Y': Y_E2_val.tolist()
}
# Create the PCE metamodel:
myPCE_E2 = uq.createModel(MetaOpts)
# Print a summary of the resulting PCE metamodel:
uq.print(myPCE_E2)
Output :
Warning: The selected PCE has more than 1 output. Only the 1st output will be
printed
In uq_PCE_print (line 12)
In uq_print_uq_metamodel (line 16)
In uq_module/Print (line 52)
In uq_print (line 29)
In uq_web_cli_filebased (line 301)
You can specify the outputs you want to be displayed with the syntax:
uq_display(PCEModule, OUTARRAY)
where OUTARRAY is the index of desired outputs, e.g. 1:3 for the first three
------------ Polynomial chaos output ------------
Number of input variables: 22
Maximal degree: 4
q-norm: 0.70
Size of full basis: 782
Size of sparse basis: 141
Full model evaluations: 7482
Leave-one-out error: 3.4710601e-02
Modified leave-one-out error: 3.6061287e-02
Validation error: 3.6288626e-02
Mean value: 72.7910
Standard deviation: 43.7980
Coef. of variation: 60.169%
--------------------------------------------------
Retrieving maximum LOO error on the trained PCE :
LOO_errors_2 = []
for i in range(len(myPCE_E2['Error'])):
LOO_errors_2.append(myPCE_E2['Error'][i]['LOO'])
max(LOO_errors_2)
Output : 0.16775668945295738