4. Running a Gromacs/LSDMap Workload¶
This section will discuss details about the execution phase. The input to the tool is given in terms of a resource configuration file and a workload configuration file. The execution is started based on the parameters set in these configuration files. In section 4.1, we discuss execution on Stampede and in section 4.2, we discuss execution on Archer.
4.1. Running on Stampede¶
4.1.1. Running using Example Workload Config and Resource Config¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Stampede.
- The workload configuration file defines the GROMACS/LSDMap workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.
Step 1 : Create a new directory for the example,
mkdir $HOME/extasy-tutorial/ cd $HOME/extasy-tutorial/
Step 2 : Download the config files and the input files directly using the following link.
curl -k -O https://raw.githubusercontent.com/radical-cybertools/ExTASY/extasy_0.1/tarballs/grlsd-on-stampede.tar.gz tar xvfz grlsd-on-stampede.tar.gz
Step 3 : In the grlsd-on-stampede folder, a resource configuration file stampede.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
REMOTE_HOST = 'xsede.stampede' # Label/Name of the Remote Machine UNAME = 'username' # Username on the Remote Machine ALLOCATION = 'TG-MCB090174' # Allocation to be charged WALLTIME = 60 # Walltime to be requested for the pilot PILOTSIZE = 16 # Number of cores to be reserved WORKDIR = None # Working directory on the remote machine QUEUE = 'normal' # Name of the queue in the remote machine DBURL = 'mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot'
Step 4 : In the grlsd-on-stampede folder, a workload configuration file gromacslsdmap.wcfg
exists. Details and modifications are as follows:
##-------------------------Applications---------------------- simulator = 'Gromacs' # Simulator to be loaded analyzer = 'LSDMap' # Analyzer to be loaded #--------------------------General-------------------------------- num_CUs = 16 # Number of tasks or Compute Units num_iterations = 3 # Number of iterations of Simulation-Analysis start_iter = 0 # Iteration number with which to start nsave = 2 # # Iterations after which output is transfered to local machine checkfiles = 4 # Iterations after which to test if the expected files are present on remote/ does not download to local #--------------------------Simulation-------------------------------- num_cores_per_sim_cu = 1 # Number of cores per Simulation Compute Units md_input_file = './input.gro' # Entire path to the MD Input file - Do not use $HOME or the likes mdp_file = './grompp.mdp' # Entire path to the MD Parameters file - Do not use $HOME or the likes top_file = './topol.top' # Entire path to the Topology file - Do not use $HOME or the likes ndx_file = None # Entire path to the Index file - Do not use $HOME or the likes grompp_options = None # Command line options for when grompp is used mdrun_options = None # Command line options for when mdrun is used itp_file_loc = None # Entire path to the location of .itp files - Do not use $HOME or the likes md_output_file = 'tmp.gro' # Filename to be used for the simulation output #--------------------------Analysis---------------------------------- lsdm_config_file = './config.ini' # Entire path to the LSDMap configuration file - Do not use $HOME or the likes num_runs = 1000 # Number of runs to be performed in the Selection step in Analysis w_file = 'weight.w' # Filename to be used for the weight file max_alive_neighbors = '10' # Maximum alive neighbors to be considered while reweighting max_dead_neighbors = '1' # Maximum dead neighbors to be considered while reweightingNote
All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported.
Now you are can run the workload using :
If your shell is BASH,
EXTASY_DEBUG=True RADICAL_PILOT_VERBOSE='debug' SAGA_VERBOSE='debug' extasy --RPconfig stampede.rcfg --Kconfig gromacslsdmap.wcfg 2> extasy.log
If your shell is CSH,
setenv EXTASY_DEBUG True setenv RADICAL_PILOT_VERBOSE 'debug' setenv SAGA_VERBOSE 'debug' extasy --RPconfig stampede.rcfg --Kconfig gromacslsdmap.wcfg |& tee extasy.log
A sample output with expected callbacks and simulation/analysis can be found at here.
Stage | Simulation | Analysis |
---|---|---|
Expected TTC/iteration | 50-100 s | ~30 s |
There are two stages in the execution phase - Simulation and Analysis. Execution starts with any Preprocessing that might be required on the input data and then moves to Simulation stage. In the Simulation stage, a number of tasks (num_CUs) are launched to execute on the target machine. The number of tasks set to execute depends on the PILOTSIZE, num_CUs, num_cores_per_sim_cu, the number of tasks in execution state simultaneously would be PILOTSIZE/num_cores_per_sim_cu. As each task attains ‘Done’ (completed) state, the remain tasks are scheduled till all the num_CUs tasks are completed.
This is followed by the Analysis stage, one task is scheduled on the target machine which takes all the cores as the PILOTSIZE to perform the analysis and returns the data required for the next iteration of the Simulation stage. As can be seen, per iteration, there are (num_CUs+1) tasks executed.
4.2. Running on Archer¶
4.2.1. Running using Example Workload Config and Resource Config¶
This section is to be done entirely on your laptop. The ExTASY tool expects two input files:
- The resource configuration file sets the parameters of the HPC resource we want to run the workload on, in this case Archer.
- The workload configuration file defines the CoCo/Amber workload itself. The configuration file given in this example is strictly meant for the gromacs-lsdmap usecase only.
Step 1 : Create a new directory for the example,
mkdir $HOME/extasy-tutorial/ cd $HOME/extasy-tutorial/
Step 2 : Download the config files and the input files directly using the following link.
curl -k -O https://raw.githubusercontent.com/radical-cybertools/ExTASY/extasy_0.1/tarballs/grlsd-on-archer.tar.gz tar xvfz grlsd-on-archer.tar.gz
Step 3 : In the grlsd-on-archer folder, a resource configuration file archer.rcfg
exists. Details and modifications required are as follows:
Note
For the purposes of this example, you require to change only:
- UNAME
- ALLOCATION
The other parameters in the resource configuration are already set up to successfully execute the workload in this example.
REMOTE_HOST = 'epsrc.archer' # Label/Name of the Remote Machine UNAME = 'username' # Username on the Remote Machine ALLOCATION = 'e290' # Allocation to be charged WALLTIME = 60 # Walltime to be requested for the pilot PILOTSIZE = 24 # Number of cores to be reserved WORKDIR = None # Working directory on the remote machine QUEUE = 'standard' # Name of the queue in the remote machine DBURL = 'mongodb://extasy:extasyproject@extasy-db.epcc.ed.ac.uk/radicalpilot'
Step 4 : In the grlsd-on-archer folder, a workload configuration file gromacslsdmap.wcfg
exists. Details and modifications required are as follows:
#-------------------------Applications---------------------- simulator = 'Gromacs' # Simulator to be loaded analyzer = 'LSDMap' # Analyzer to be loaded #--------------------------General-------------------------------- num_CUs = 24 # Number of tasks or Compute Units num_iterations = 2 # Number of iterations of Simulation-Analysis start_iter = 0 # Iteration number with which to start nsave = 1 # # Iterations after which output is transfered to local machine checkfiles = 4 # Iterations after which to test if the expected files are present on remote/ does not download to local #--------------------------Simulation-------------------------------- num_cores_per_sim_cu = 1 # Number of cores per Simulation Compute Units md_input_file = './input.gro' # Entire path to the MD Input file - Do not use $HOME or the likes mdp_file = './grompp.mdp' # Entire path to the MD Parameters file - Do not use $HOME or the likes top_file = './topol.top' # Entire path to the Topology file - Do not use $HOME or the likes ndx_file = None # Entire path to the Index file - Do not use $HOME or the likes grompp_options = None # Command line options for when grompp is used mdrun_options = None # Command line options for when mdrun is used itp_file_loc = None # Entire path to the location of .itp files - Do not use $HOME or the likes md_output_file = 'tmp.gro' # Filename to be used for the simulation output #--------------------------Analysis---------------------------------- lsdm_config_file = './config.ini' # Entire path to the LSDMap configuration file - Do not use $HOME or the likes num_runs = 100 # Number of runs to be performed in the Selection step in Analysis w_file = 'weight.w' # Filename to be used for the weight file max_alive_neighbors = '10' # Maximum alive neighbors to be considered while reweighting max_dead_neighbors = '1' # Maximum dead neighbors to be considered while reweightingNote
All the parameters in the above example file are mandatory for gromacs-lsdmap. If ndxfile, grompp_options, mdrun_options and itp_file_loc are not required, they should be set to None; but they still have to mentioned in the configuration file. There are no other parameters currently supported.
Now you are can run the workload using :
If your shell is BASH,
EXTASY_DEBUG=True RADICAL_PILOT_VERBOSE='debug' SAGA_VERBOSE='debug' extasy --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg 2> extasy.log
If your shell is CSH,
setenv EXTASY_DEBUG True setenv RADICAL_PILOT_VERBOSE 'debug' setenv SAGA_VERBOSE 'debug' extasy --RPconfig archer.rcfg --Kconfig gromacslsdmap.wcfg |& tee extasy.log
A sample output with expected callbacks and simulation/analysis can be found at here.
Stage | Simulation | Analysis |
---|---|---|
Expected TTC/iteration | 200-350 s | ~30 s |
There are two stages in the execution phase - Simulation and Analysis. Execution starts with any Preprocessing that might be required on the input data and then moves to Simulation stage. In the Simulation stage, a number of tasks (num_CUs) are launched to execute on the target machine. The number of tasks set to execute depends on the PILOTSIZE, num_CUs, num_cores_per_sim_cu, the number of tasks in execution state simultaneously would be PILOTSIZE/num_cores_per_sim_cu. As each task attains ‘Done’ (completed) state, the remain tasks are scheduled till all the num_CUs tasks are completed.
This is followed by the Analysis stage, one task is scheduled on the target machine which takes all the cores as the PILOTSIZE to perform the analysis and returns the data required for the next iteration of the Simulation stage. As can be seen, per iteration, there are (num_CUs+1) tasks executed.
4.3. Understanding the Output¶
In the local machine, a “backup” folder is created and at the end of every checkpoint intervel (=nsave) an “iter*” folder is created which contains the necessary files to start the next iteration.
For example, in the case of gromacs-lsdmap on stampede, for 4 iterations with nsave=2:
grlsd-on-stampede$ ls
backup/ config.ini gromacslsdmap.wcfg grompp.mdp input.gro stampede.rcfg topol.top
grlsd-on-stampede/backup$ ls
iter1/ iter3/
The “iter*” folder will not contain any of the initial files such as the topology file, minimization file, etc since they already exist on the local machine. In gromacs-lsdmap, the “iter*” folder contains the coordinate file and weight file required in the next iteration. It also contains a logfile about the lsdmap stage of the current iteration.
grlsd-on-stampede/backup/iter1$ ls
2_input.gro lsdmap.log weight.w
On the remote machine, inside the pilot-* folder you can find a folder called “staging_area”. This location is used to exchange/link/move intermediate data. The shared data is kept in “staging_area/” and the iteration specific inputs/outputs can be found in their specific folders (=”staging_area/iter*”).
$ cd staging_area/
$ ls
config.ini gro.py input.gro iter1/ iter3/ post_analyze.py reweighting.py run.py spliter.py
grompp.mdp gro.pyc iter0/ iter2/ lsdm.py pre_analyze.py run_analyzer.sh select.py topol.top
4.4. Gromacs/LSDMap Restart Mechanism¶
If the above examples were successful, you can go ahead try and the restart mechanism. The restart mechanism is designed to resume the experiment from one of the checkpoints that you might have made in the previous experiments.
Therefor, for a valid/successful restart scenario, data from a previous experiment needs to exist in the backup/ folder on the local machine. Restart can only be done from a checkpoint (defined by nsave in the kernel config file) made in the previous experiment.
Example,
Experiment 1 : num_iterations = 4, start_iter = 0, nsave = 2
Backups created : iter1/ (after 2 iterations) , iter3/ (after 4 iterations)
Experiment 2 (restart) : num_iterations = 2, start_iter = 4 (=start from 5th iter), nsave = 2
Note : start_iter should match one of the previous checkpoints and start_iter should be a multiple of nsave.
If, in the first experiment, you ran 4 iterations with nsave set to 2, you will have backups created after the 2nd and 4th iteration. Once this is successful, in the second experiment, you can resume from either of the backups/checkpoints. In the above example, the experiment is resumed from the 4th iteration.