I’m having some trouble connecting to a UQ cloud session. I get a timeout error:
Traceback (most recent call last):
File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/spyder_kernels/py3compat.py:356 in compat_exec
exec(code, globals, locals)
File ~/Documents/Postdoc_IQF/python/uqlab/uq_sens_analysis_Hgmodel_loc_iodine.py:516
mySession = sessions.cloud()
File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:23 in __call__
cls._instances[cls] = super(_Singleton, cls).__call__(*args, **kwargs)
File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:421 in __init__
self.new()
File /opt/anaconda3/envs/uqlab/lib/python3.10/site-packages/uqpylab/sessions.py:435 in new
raise RuntimeError(resp['Message'])
RuntimeError: Timeout reached
I’m not sure if the issue is related to a user-limit imposed on the cloud usage? Also, I was wondering if there are plans to release uq[py]lab with local computation capabilities in the future?
Hello everyone,
I have the same problem as Aryeh, I was able to connect to the cloud this morning and perform analysis but now I get the “Timeout reached” error when I try to create a UQ cloud session.
Is there a seerver problem ?
Thanks a lot for letting us know! We discovered that one process was consuming almost all of the resources. We have terminated it, and everything should be working properly again.
I checked the main server, and indeed one worker was using over 100GB of RAM, hence causing slowdowns. I have manually killed the process, and I will investigate why this was not caught by the automatic resource control (we are updating the backend continuously).
@ari_f : yes we plan to provide a completely offline version of UQPyLab towards the end of this year/early next year, that will substitute the current centrally hosted system.
While the client won’t be affected (so current scripts will continue to work), the responsibility to host the server will be moved to the user or their institutions, and we will stop hosting free instances.
Details about this process are still being finalized.
Thank you for your reply and the fix.
It still seems quite unstable : I managed to connect for a minute and run an analysis but now I’m back to “Timeout reached error”.
If your analysis takes more than 3 minutes (to minimize dangling sessions), the session is closed automatically. You can set the timeout yourself from the client if you need longer wait times–it can be set in seconds. As an example, to set it to 1000s, use the following line:
mySession.timeout = 1000;
Sometimes, the calculations cannot be gracefully closed even after the timeout is reached, and the worker keeps running until the operation ends.
In these cases, you can use the following command to force a worker to restart:
To give you a little bit of context on the analysis I am trying to make :
I am trying to do a PCE based sensitivity analysis (First order sobol index).
The analysis worked for a first simpler model with 8 parameters and now I am trying to make it with a more complex model (18 parameters).
I am training the PCE on samples that were previousely computed (I did that for the first analysis as well).
The mySession.timeout trick worked and I don’t get the Timeout error anymore but when I run the
myPCE = uq.createModel(MetaOpts)
command line, the analysis doesn’t seem to end (it has been currently runing for 13 min). I am surprised because there is an example of PCE based Sobol analysis with more than 100 parameters on the UQ[py]Lab website.
Do you have any idea on where this issue might come from ?
Thanks @Adela and @ste! Good to know about these plans. I’m interested in running sensitivity analyses for many cases so I wasn’t sure if this is appropriate for the cloud system.
Hi, I didn’t want to start a new topic, so I’m writing here.
I’m encountering the same issue with UQ[py]LAB 1.0.2 and RuntimeError: Timeout Reached. I don’t think it’s the problem with session time limit because I get an error message before the timeout.session value is reached. Also, the error appears when I try to start the uqpylab session.
I would be very grateful for help in solving this problem,
Jacek.
Everything seems to be fine on our side. If the timeout is not a problem, the session might still be active and not properly terminated. In this case, you can use the following command to force a worker to restart:
To add to @Adela’s comment: when you kill a job on the client, e.g. through Ctrl-C, the remote job is not killed immediately, and for some specific jobs it is not killed until completion (e.g. long Bayesian analyses). In these cases, connecting to the server does not provide a response until your worker is free, resulting in a timeout even during the connection.
The “force_restart” flag instructs the remote API to force-kill the worker and restart a fresh session, without waiting for worker availability.
The error was actually on my side due to manual interruption of the python script. A force restart allowed the session to be restarted. Thank you very much for such quick help and comprehensive explanation of the problem!