Uq[py]lab v0.95 - trouble connecting to a UQ cloud session

I’m having some trouble connecting to a UQ cloud session. I get a timeout error:

Traceback (most recent call last):

  File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
    exec(code, globals, locals)

  File ~/Documents/Postdoc_IQF/python/uqlab/uq_sens_analysis_Hgmodel_loc_iodine.py:516
    mySession = sessions.cloud()

  File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:23 in __call__
    cls._instances[cls] = super(_Singleton, cls).__call__(*args, **kwargs)

  File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:421 in __init__
    self.new()

  File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:435 in new
    raise RuntimeError(resp['Message'])

RuntimeError: Timeout reached

I’m not sure if the issue is related to a user-limit imposed on the cloud usage? Also, I was wondering if there are plans to release uq[py]lab with local computation capabilities in the future?

Hello everyone,
I have the same problem as Aryeh, I was able to connect to the cloud this morning and perform analysis but now I get the “Timeout reached” error when I try to create a UQ cloud session.
Is there a seerver problem ?

Best regards,
Guillaume Gru

Hello everyone,

Thanks a lot for letting us know! We discovered that one process was consuming almost all of the resources. We have terminated it, and everything should be working properly again.

Best,
Adela

Dear @ari_f and @GuillaumeGru,
thank you for reporting this.

I checked the main server, and indeed one worker was using over 100GB of RAM, hence causing slowdowns. I have manually killed the process, and I will investigate why this was not caught by the automatic resource control (we are updating the backend continuously).

@ari_f : yes we plan to provide a completely offline version of UQPyLab towards the end of this year/early next year, that will substitute the current centrally hosted system.
While the client won’t be affected (so current scripts will continue to work), the responsibility to host the server will be moved to the user or their institutions, and we will stop hosting free instances.
Details about this process are still being finalized.

Best regards,
Stefano

Dear @Adela and @ste,

Thank you for your reply and the fix.
It still seems quite unstable : I managed to connect for a minute and run an analysis but now I’m back to “Timeout reached error”.

Best regards,
Guillaume Gru

Dear @GuillaumeGru ,

If your analysis takes more than 3 minutes (to minimize dangling sessions), the session is closed automatically. You can set the timeout yourself from the client if you need longer wait times–it can be set in seconds. As an example, to set it to 1000s, use the following line:

mySession.timeout = 1000;

Sometimes, the calculations cannot be gracefully closed even after the timeout is reached, and the worker keeps running until the operation ends.

In these cases, you can use the following command to force a worker to restart:

mySession = sessions.cloud(host=myInstance,token=myToken,force_restart=True)

Let me know if this helps!

Best regards
Adela

Dear @Adela,

thank you for your reply.

To give you a little bit of context on the analysis I am trying to make :

I am trying to do a PCE based sensitivity analysis (First order sobol index).
The analysis worked for a first simpler model with 8 parameters and now I am trying to make it with a more complex model (18 parameters).
I am training the PCE on samples that were previousely computed (I did that for the first analysis as well).

The mySession.timeout trick worked and I don’t get the Timeout error anymore but when I run the
myPCE = uq.createModel(MetaOpts)
command line, the analysis doesn’t seem to end (it has been currently runing for 13 min). I am surprised because there is an example of PCE based Sobol analysis with more than 100 parameters on the UQ[py]Lab website.

Do you have any idea on where this issue might come from ?

Best regards,
Guillaume Gru

Dear @GuillaumeGru,

Could you please provide me with a minimal working example so that I can pinpoint the issue?

Thank you very much!

Best regards,
Adela

Actually, there was a detail that I forgot to take from the multi-dimensional example : the part where the PCE truncation scheme is provided.

PCEOpts[‘TruncOptions’] = {
‘qNorm’: 0.7,
‘MaxInteraction’: 2
}

With this fix, the PCE training works.

Thank you for your help, best regards,
Guillaume

1 Like

Thanks @Adela and @ste! Good to know about these plans. I’m interested in running sensitivity analyses for many cases so I wasn’t sure if this is appropriate for the cloud system.

Dear @GuillaumeGru ,

I am glad that it works for you now!

Best regards,
Adela