Active learning reliability concerns


I have a few questions and will appreciate thoughts on it.

  1. For the adaptive technique, how do you select the initial experimental design for a given problem. Is there a rule of thumb or reference you could point me to? For example, if my limit state function has only three variables - what might be the number of initial sample points before enrichment?

  2. In terms of the Kmeans clustering approach, how do one determine the number of clusters - is this user defined or is there a means of determining the number of clusters for multiple enrichment of experimental design? In other words, how to determine K (any rule of thumb or user defined)

  3. Also, is the active learning method quite time consuming due to the size of the candidate sample pool especially for problems with small failure probability?

  4. If using a PC-Kriging metamodel does it take a larger training time compared to ordinary Kriging? Any advantage of PC-K over ordinary Kriging in terms of training time and result accuracy?

  5. How do we handle input dependent variable during PCK training?

Appreciate thoughts on this

Dear @aokoro

Please find some hints for your various questions:

  1. The initial experimental design is often chosen of size n_{init} = \max(2\cdot M,10), where M is the number of input parameters, i.e. twice the dimensionality of the problem, but at least 10 points.

  2. For the number of clusters, see the other post here

  3. Active learning involves different ingredients: the type of surrogate, the reliability solver, the enrichment criterion and a stopping criterion. So there is no single answer. Yet the point is to get results within a few hundreds of runs of the computational model, whereas any simulation method (even advanced ones, such as (sequential) importance sampling, line sampling or subset simulation) usually require \mathcal{O}(10^{4}) runs or more.
    If your problem is likely to have a small failure probability <10^{-5} , you definitely need to use subset simulation together with the surrogate.

  4. PC-Kriging generally shows better accuracy than (ordinary) Kriging when you have small experimental designs or high dimensions (M>50).
    We used PC-Kriging with subset simulation to solve the TNO blind benchmark (details here), where results are commented here. This approach allowed us to get the most efficient/accurate results (out of 9 participants) on 24 out of 27 component- and system reliability problems (this comes from the benchmark authors at TNO).

We’re currently finalizing a paper with @moustapha and @ste where another benchmark of 20 problems and 41 methods is carried out. More soon !

  1. Dependent variables are handled as usual through an isoprobabilistic transform to allow the construction of orthogonal polynomials.

I hope it helps
Best regards

1 Like

Thank you so much Prof for the time and clarification. This makes it very clear now.