Issue with HPC dispatcher module: module loading and MPI

Hello everyone,

I am trying to dispatch the “uq_Example_Dispatcher_01_BasicUsage.m” to our HPC cluster, using “profile_file_template_basic.m” as a template for my profile file. The problem I am facing is twofold:

  1. I want to pass the following information to PrevCommands, so that they appear as separate lines in my slurm submission script (EnvSetup does not get written to the latter, I believe):
#SBATCH -p cclake
. /etc/profile.d/modules.sh
module purge
module load rhel8/default-icl
module load anaconda/3.2019-10
module load matlab/r2021b
source ~/.bashrc

Based on page 25 of the dispatcher user manual, PrevCommands is meant to be a cell array, so I tried the following two options:

Option 1:

PrevCommands = reshape({'#SBATCH -p cclake', 'module load rhel7/default-ccl', 'module load anaconda/3.2019-10', 'module load matlab/R2021b', 'source ~/.bashrc'}, [5,1]);

Option 2:

PrevCommands = {'#SBATCH -p cclake', 'module load rhel7/default-ccl', 'module load anaconda/3.2019-10', 'module load matlab/R2021b', 'source ~/.bashrc'};

Both returned the following error:

Error using horzcat
Inconsistent concatenation dimensions because a 1-by-7 'char' array was converted to a 1-by-1 'cell' array. Consider creating arrays of the same type before concatenating.
 
Error in uq_Dispatcher_util_checkCommand (line 40)
        cmdName = [envCommands 'command'];

Passing a single argument works fine though, but it obviously doesn’t allow me to load matlab, anaconda, etc.

My first question is therefore: how do I specify multiple ‘PrevCommands’, so that they all appear in my slurm submission script?

  1. Our HPC cluster uses Intel MPI instead of OpenMPI. As a result, the command ‘mpirun --report-pid mpirun.pid -np 1 ./mpifile.sh’ in ‘qfile.sh’ is not recognised (in the .stderr output file I get unrecognized argument report-pid). Now, if I run the dispatcher object from the remote host (login node of HPC cluster) by typing mpirun -np 1 ./mpifile.sh, I get the following error message:
Error in uq_remote_script (line 42)
matOutObj.Y = Y;

Error in run (line 91)
evalin('caller', strcat(script, ';'));

My second question is therefore: is it possible to execute ‘uq_remote_script.m’ on the login node (for debugging) and can I change the following lines in ‘qfile.sh’ to be compatible with Intel MPI (for simulation)?

mpirun --report-pid mpirun.pid -np 1 ./mpifile.sh

Best wishes and many thanks in advance,
Nils

Hello again,

I found out that some of our older compute nodes use OpenMPI, so my second question has been resolved.

However, if anyone knows how to pass multiple arguments to Prevcommands, so they all appear line-by-line in my slurm submission script, please let me know as this step is currently preventing me from using UQLab on the HPC.

Best wishes and many thanks for your time,
Nils

Dear @nmb29

Can you please provide a self contained minimal reproducible example for this issue?

Best regards
Styfen

Dear @styfen.schaer,

Thank you for your reply. Unfortunately, I am unable to upload files to UQWorld (‘Sorry, new users can not upload attachments.’), so I sent them to your ETH email address. If we manage to resolve the issue, I will summarise the fix here as reference for others.

Best wishes,
Nils

I don’t need your actual files. Just an as small as possible script with some dummy data/functions that reproduces the error. You can post your code here.

Dear Styfen,

Please create and submit a dispatcher file from the following code:

%% 1 - INITIALIZE UQLAB
clearvars
rng(100,'twister')
uqlab

%% 2 - COMPUTATIONAL MODEL
ModelOpts.mString = 'X.*sin(X)';
ModelOpts.isVectorized = true;
myModel = uq_createModel(ModelOpts);

%% 3 - PROBABILISTIC INPUT MODEL
InputOpts.Marginals.Type = 'Uniform';
InputOpts.Marginals.Parameters = [0 15];
myInput = uq_createInput(InputOpts);

%% 4 - DISPATCHER CONFIGURATION
DispatcherOpts.Profile = 'myHPCProfile.m';
myDispatcher = uq_createDispatcher(DispatcherOpts);
uq_print(myDispatcher)

%% 5 - DISPATCHED MODEL EVALUATION USING A DISPATCHER OBJECT
X = uq_getSample(10);
Ydispatched = uq_evalModel(X,'HPC')

Also create an HPC profile file from the following code:

%% Authentication
Hostname = 'login-p-1.hpc.cam.ac.uk';
Username = 'nmb48';
PrivateKey = '/home/nmb48/Documents/GitHub/PhD_Code/For_HPC/sensitivity/hpc/key_22042024';

%% Remote workspace
RemoteFolder = '/home/nmb48/rds/UQLab_workspace';

%% Remote computing environment
%% MATLAB
MATLABCommand = '/usr/local/Cluster-Apps/matlab/R2021b/bin/matlab';
%% UQLab
RemoteUQLabPath = '/home/nmb48/UQLab_Rel2.0.0/';

%% Remote environment
%EnvSetup = reshape({'. /etc/profile.d/modules.sh', 'module load rhel7/default-ccl', 'module load openmpi/4.1.5/intel/b42idtrx'}, [3,1]); % @Styfen Schaer: these are the inputs I would like to parse to my slurm submission script, but I get an error
EnvSetup = {'. /etc/profile.d/modules.sh'};{'module load rhel7/default-ccl'};{'module load openmpi/4.1.5/intel/b42idtrx'}; % @Styfen Schaer: this does not error, but it only parses the first argument
%PrevCommands = reshape({'#SBATCH -p cclake', 'module load rhel7/default-ccl', 'module load anaconda/3.2019-10', 'module load matlab/R2021b', 'source ~/.bashrc'}, [5,1]); % @Styfen Schaer: these are the inputs I would like to parse to my slurm submission script, but I get an error
PrevCommands = {'#SBATCH -p cclake'};{'module load rhel7/default-ccl'};{'module load anaconda/3.2019-10'};{'module load matlab/R2021b'};{'source ~/.bashrc'}; % @Styfen Schaer: this does not error, but it only parses the first argument

%% Job scheduler
Scheduler = 'slurm';

The slurm submission script I get from running the above code reads as follows:

#!/bin/bash
#SBATCH --job-name=02May2024_at_15583588
#SBATCH --output=02May2024_at_15583588.stdout
#SBATCH --error=02May2024_at_15583588.stderr
#SBATCH --time=60
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1

echo Running on host `hostname`
echo Time is `date`
echo Directory is `pwd`

#SBATCH -p cclake

cd /home/nmb48/rds/UQLab_workspace/02May2024_at_15583588
mkdir logs

touch .uq_job_started
mpirun --report-pid mpirun.pid -np 1  ./mpifile.sh

The .stderr output file reads as follows:

[mpiexec@cpu-p-65] match_arg (…/…/…/…/…/src/pm/i_hydra/libhydra/arg/hydra_arg.c:91): unrecognized argument report-pid
[mpiexec@cpu-p-65] HYD_arg_parse_array (…/…/…/…/…/src/pm/i_hydra/libhydra/arg/hydra_arg.c:128): argument matching returned error
[mpiexec@cpu-p-65] mpiexec_get_parameters (…/…/…/…/…/src/pm/i_hydra/mpiexec/mpiexec_params.c:1313): error parsing input array
[mpiexec@cpu-p-65] main (…/…/…/…/…/src/pm/i_hydra/mpiexec/mpiexec.c:1738): error parsing parameters

I hope this helps.

Best wishes,
Nils