libEnsemble on Aurora
libEnsemble is a Python toolkit for running dynamic ensembles of calculations.
Users provide generator and simulator functions to express their ensembles, where the generator can steer the ensemble based on previous results. These functions can portably submit user executables at any scale.
System details are detected, and dynamic resource management is provided. This includes automatically detecting, assigning, and reassigning GPUs for ensemble members.
libEnsemble can be used in a consistent manner on laptops, clusters, and supercomputers with minimal required dependencies.
Configuring Python and Installation
To obtain Python and create a virtual environment:
module load frameworks
python -m venv /path/to-venv --system-site-packages
. /path/to-venv/bin/activate
where /path/to-venv
can be anywhere you have write access. For future sessions, just load the frameworks module and run the activate line.
To obtain libEnsemble::
See the ALCF docs for more details on using Python on Aurora.
Example
To run the forces_gpu tutorial on Aurora.
To obtain the example, you can clone the libEnsemble
repository--- although only the forces
sub-directory is needed:
git clone https://github.com/Libensemble/libensemble
cd libensemble/libensemble/tests/scaling_tests/forces/forces_app
To compile forces
(a C application with OpenMP target):
Now go to the forces_gpu
directory:
To use all available GPUs, open run_libe_forces.py
and adjust the exit criteria to perform more simulations. The following will run two simulations per worker:
# Instruct libEnsemble to exit after this many simulations
ensemble.exit_criteria = ExitCriteria(sim_max=nsim_workers*2)
Now grab an interactive session on two nodes (or use the batch script at ../submission_scripts/submit_pbs_aurora.sh
):
Once in the interactive session, you may need to reactivate your virtual environment:
Then, run:
This provides twelve workers for running simulations (one for each GPU across two nodes). An extra worker runs the persistent generator. GPU settings for each worker simulation are printed.
Looking at libE_stats.txt
will provide a summary of the runs.
Try running
And you will see it runs with two cores (mpi ranks) and two GPUs are used per worker.
Live viewing GPU usage
To see GPU usage, SSH into a compute node you are on in another window and run:
Using Tiles as GPUs
To treat each tile as its own GPU, add the use_tiles_as_gpus=True
option to the libE_specs
block in run_libe_forces.py
:
ensemble.libE_specs = LibeSpecs(
num_resource_sets=nsim_workers,
sim_dirs_make=True,
use_tiles_as_gpus=True,
)
Now, rerun with twice the workers:
The forces
example will automatically use the GPUs available to each worker (one MPI rank per GPU). If fewer workers are provided, multiple GPUs will be used per simulation.
Also, see forces_gpu_var_resources
and forces_multi_app
examples for cases using varying processor/GPU counts per simulation.
Running generator on the manager
An alternative is to run the generator on a thread on the manager. The number of workers can then be set to the number of simulation workers.
Change the libE_specs
in run_libe_forces.py as follows:
nsim_workers = ensemble.nworkers
# Persistent gen does not need resources
ensemble.libE_specs = LibeSpecs(
gen_on_manager=True,
then the first run we did will use 12 instead of 13 workers:
Dynamic resource assignment
In the forces directory you will also find:
forces_gpu_var_resources
uses varying processor/GPU counts per simulation.forces_multi_app
uses varying processor/GPU counts per simulation and also uses two different user executables, one which is CPU-only and one which uses GPUs. This allows highly efficient use of nodes for multi-application ensembles.
Demonstration
A video demonstration of the forces_gpu
example on Frontier is available. The workflow is identical when running on Aurora, except for different compiler options and numbers of workers due to differing GPU counts per node.
More details: - libEnsemble Documentation - libEnsemble GitHub page - libEnsemble Documentation Aurora page