I am using PCE to build a metamodel through an experimental design in which X, the input vector, is defined as a time vector and Y, the output vector, defined as the quantity of interest that it has an arbitrary distribution. My doubt is:
How to build a PCE for a time-dependent model using uqlab?
Which polynomial base to define? Is there any way to establish a probability density function?
I am not sure I understand your model correctly. You say that the input \mathbf{X} is a vector of times. Is it really a random vector of times, so that each entry X_i is a random point in time? If yes, can you specify the distribution of each X_i?
Or is it rather so that the model always takes a fixed vector of times, and some additional parameters, and returns a corresponding (time series of output) QoI?
And regarding the output of your model: for fixed \mathbf{x} \sim \mathbf{X}, is \mathcal{M}(\mathbf{x}) a scalar or a vector?
If you give us some more details, I’m sure we can help you with this. It might be the easiest to just describe your specific application in more detail.
I have a input \mathbf{X} \in \mathbb{Z_{+}} that is a vector of time and, for each X_i it has a Y_i, which is a random scalar value and with an arbitrary distribution.
For instance:
The vector \mathbf{X} is a vector of days and, for each day (X_i), it has a number of deaths accumulated by a disease (Y_i).
Then, I would like to build a metamodel using PCE through an experimental design and I have doubts on how to define the polynomial base?
Is the vector of times always (or usually) the same, or does it change randomly? Are there other parameters in your model that you modify? Does your model maybe have the form f(t_1, t_2, ..., t_n, \mathbf{p}) = (y_1, y_2, ..., y_n)
with t_1 < t_2 < ... usually the same points, and the output is a time series, where each output y_i corresponds to the point in time t_i, but is also influenced by additional input parameters \mathbf{p} and by what happened before?
Then time is not an input in our UQ sense (it is not random), but a parameter (in the sense of the Model user manual, section 2.1.4). Have a look at @damarginal’s answer to this question, it sounds a bit similar to your description.
If this is not the case, could you tell us how exactly the randomness enters your problem? Can you maybe write your model in functional form as I did above, or describe the equations that your model solves, and tell us what kind of information goes into the model and what comes out?
Overall, my goal is to use a time series of the number of infected daily due to disease to predict it a few days ahead. Thus, I have only a random amplitude time series y(t) and a uniformly spaced deterministic vector t.
In this way, I wanna use a regressor of the form r(y_1,\dots,y_n,t) = y, such that receive the past and the current instant as input and can predict the future. It is a stochastic process that has a short-term memory, where the long past does not interfere much and the more present past does.
I am not sure if I understand your question correctly. But can you describe your “model” with an equation for instance with an ordinary differential equation (ode). If so I might be able to help you. It would be great if you can post the equation here.
That’s an interesting problem. I think your formulation already gives a hint on how to approach it:
So, the input to your model would be the time series y_1, ..., y_n, not the (deterministic) points in time t_1, ..., t_n.
What about the future point in time t? It is probably not random, but fixed or from a discrete set (I would guess, k days into the future: t_{n+1}, ..., t_{n+k}).
Then we can rewrite the surrogate model that you are looking for in the following way: r(y_1,...,y_n) = (y_{n+1}, ..., y_{n+k})
(vector-valued model).
Probably, your n is quite large, and the points y_i are correlated. As you say, y_t is a stochastic process with a short memory (short correlation length). It probably won’t make much sense to use the time series directly as input to your PCE, since the input is so high-dimensional and highly correlated. You might consider reducing the dimensionality of the input space using e.g. PCA/KLE. Since you want to do both dimensionality reduction and surrogate modelling, this paper could be of interest to you: Lataniotis, Marelli, Sudret (2020): Extending classical surrogate modelling to high dimensions through supervised dimensionality reduction: a data-driven approach
Regarding the output, depending on how many days into the future you want to predict, you might create one PCE for each day t_{n+1}, ..., t_{n+k}, or again apply dimension reduction to the output.
I hope this helps! Let us know what you choose to do