Dragon¶
DragonHPC is a composable distributed runtime for managing processes, memory, and data at scale through high-performance communication. Dragon is an open source project developed by HPE.
Dragon has a Python API and a C/c++ API. The Python API is an extension of Python's multiprocessing API, and therefore DragonHPC can be used to scale Python multiprocessing code across multi-node jobs. It can be installed with pip in conda or Python virtual environments.
DragonHPC allows parallel process launching, including PMI enabled processes for MPI applications, with fine-grained control of CPU/GPU affinity. DragonHPC also has a distributed data layer or distributed dictionary that allows for in-memory data sharing between processes on different nodes.
Please see DragonHPC's Introduction in their documenation for examples of how to use dragon.
Installation¶
To install in a Python virtual environment on Polaris:
The last installation step that calls dragon-config is necessary to enable dragon to use high-speed RDMA transfers across Polaris's Slingshot network. Skipping this step will result in dragon using slower TCP transfers for cross node communication and data transfer.
Execution of Dragon Driver Scripts¶
Dragon driver scripts written with the Python API should be executed with the dragon application:
Alternatively, it can be launched with the python binary but the -m flag should be set to dragon:
Dragon needs access to the PBS qstat command in order to run (this is used to determine the nodes have been allocated by PBS as opposed to SLURM or LSF and determines how dragon will discover nodes and launch the runtime). In some interactive jobs on Polaris, you may need to modify the PATH to ensure qstat is visible to dragon:
Currently, we also recommend unloading the xalt module on Polaris when running Dragon:
Policies for Polaris Nodes¶
The dragon object that sets CPU, GPU and node affinities for processes is the Dragon Policy.
A common Policy setting on Polaris is to run one process per GPU (four per node).
Dragon's Native Pool can use policies to run a pool of processes that bind each process to specific GPUs and CPUs on specific nodes. (Note that the multiprocessing Pool with dragon selected as the start method can only distribute processes across multiple nodes; it cannot set explicit GPU and CPU binding affinities). Here is an example of how to run a pool across nodes on Polaris binding 1 pool process per GPU with the Native Dragon Pool:
Here is an example of how to do a similar thing with a ProcessGroup:
Distributed Dictionaries¶
Dragon offers a distributed data layer called a Dragon Dictionary. The Dragon Dictionary or DDict can span all nodes or a subset of nodes in your runtime and can also use Policies for optimal node placement. Regardless of whether a DDict spans all nodes or a subset, any process in the runtime on any node can access the dictionary.
To create a DDict that spans all nodes in the runtime:
Running MPI applications¶
MPI applications can be run with the Dragon ProcessGroup. To enable message passing between processes in a Dragon ProcessGroup on Polais set the pmi flag when creating the process group like this:
ProcessGroup according to your application needs. See the DragonHPC documentation on orchestrating MPI applications.