@xujia thank you very much for your detailed answer and introduction to many new exciting aspects of metamodelling! I will try to answer as accurately as I can despite my limited knowledge:
You understood my goals well. The first one (output matrix instead of vector) seems easy to implement, so I will attempt this in UQLab.
If some patterns in the output are helpful (or the dimensionality of the output vector is extremely high), you can post-process the output to calculate the associated values (on a basis) and then build a PCE for these quantities.
By this, I suppose you mean that I should reduce as much as I can the output dimensionality (for computational efficiency?). I analyse a tall building in vertical collapse, so from the stupidly large deformation vector I obtain in the FE model I could narrow down to a few hundreds of outputs (say, z-axis deformations at all the beam midspans for each bay of the building). Is this too much you think? I plan to analyse multiple damage scenarios and multiple buildings, so I am interested in the fastest surrogation possible.
However my ultimate goal is the second point, so yes ideally I would create a surrogate that directly predicts the collapse scenario(s). I have thought of one way to classify deformations to collapse scenarios: a 3D matrix Aijk representing the 3D geometry, where i = x-bay column, j = y-bay column and k = storey number. So A111 would be the ground floor column at grid position (x1,y1), A121 would be the ground floor column at grid position (x1,y2), and so on. A value of zero would mean there is no collapse there (determined by the z-axis deformation amount), while a value of one would mean there is collapse. In a simplified, coarse way, I end up with matrices of zeros and ones, each unique one corresponding to a different collapse scenario. The area of collapse is simply calculated by multiplying this matrix by the bay size and taking the sum of all elements.
So yes, there can be a direct relationship between the output deformations matrix (selected few hundred values) and the unique collapse mechanisms. The problem is however that the raw output (deformations matrix) will be very highly nonlinear: imagine following the deformation of the tip of a cantilever for different (probabilistic) load and material strength inputs. While the cantilever doesn’t yield or break, the deflection is a nice continuous variable. But if, for certain combinations of the inputs, the cantilever yields or breaks, I will get a sudden step change in the output. Now the same applies for my tall building, but a lot more complex: minute fluctuations of the inputs may give rise to wildly different collapse scenarios.
For the second goal, you can directly use the output vectors represented by a series of PCEs to run Monte Carlo simulations. However, may I ask how do you classify an output vector to a collapse scenario (and the collapse area seems to be continuous rather than discrete)? If this is determined by checking a few continuous values as functions of the output vectors, I would suggest directly build a surrogate on these values instead of emulating the nodal displacements as an intermediate stage.
So while I understand you here, I get stuck in the fact that I am checking a few values indeed, but they are not exactly continuous. I checked the new things you mentioned and multinomial logistic regression seems the most appropriate, although my knowledge it beyond simple statistic programs is very limited. Do I understand correctly that I need many model runs to do such a regression? (or to train a program to do it?). Unfortunately, my model is expensive (in the order of minutes per run).
I hope the detailed info clarifies things a bit more. Do you have any further advice? Should I be looking at the SVMs since they are implemented in UQLab? (binary result would be {type x collapse; no collapse}. I suppose I can find resources for a one-vs-all procedure in Matlab or Python).
Thank you again,
Konstantinos