Dask is a Python library for parallel and distributed computing. A Dask cluster is composed of one scheduler that coordinates the job of many workers, which can have access to CPU or GPU resources.
RAPIDS is a suite of software libraries by NVIDIA for "building end-to-end data science and analytics pipelines on GPUs". For example, RAPIDS' cuDF, cuPY, cuML libraries implement common Pandas, Numpy and Scikit-learn APIs, respectively, allowing to run them at scale on a GPU cluster, using Dask.
Here we show how to install RAPIDS and Dask in a conda environment on Sophia and how to start a cluster with GPU workers.
Start an interactive session. Follow the instructions specified here to start an interactive job on Sophia. In the example command below we request 2 GPUs:
Follow the installation instructions on the RAPIDS Docs, select the appropriate CUDA Version (you can find it in the output of nvidia-smi), and copy the installation command, which should be similar to the one below (replace /path/to/env/rapids-25.06_sophia with your preferred path and name for the environment):
condacreate-y-p/path/to/env/rapids-25.06_sophia-crapidsai-cconda-forge-cnvidiarapids=25.06python=3.11'cuda-version>=12.0,<=12.8'# activate the environmentcondaactivate/path/to/env/rapids-25.06_sophia
Optional: Install jupyterlab and create a ipykernel
#!/bin/bash# $1 : number of ranks per node (dafault: 8)TMP_EXE=dask_start_worker.sh
cat>${TMP_EXE}<< EOF#!/bin/bashCUDA_VISIBLE_DEVICES=\${OMPI_COMM_WORLD_LOCAL_RANK} dask cuda worker \ --device-memory-limit 40GB \ --scheduler-file ~/scheduler.json \ --protocol tcp \ >/tmp/dask_worker_\${OMPI_COMM_WORLD_RANK}_\${OMPI_COMM_WORLD_LOCAL_RANK}_\${HOSTNAME}.log 2>&1EOFchmod755${TMP_EXE}# start the schedulerrm-f~/scheduler.json
echo"starting the scheduler"nohupdaskscheduler--scheduler-file~/scheduler.json>/tmp/dask_scheduler.log2>&1&sleep10# start the workersNUM_NODES=$(cat$PBS_NODEFILE|wc-l)NRANKS_PER_NODE=${1:-8}echo"starting"${NRANKS_PER_NODE}"workers per node on"${NUM_NODES}"nodes"mpiexec-np$((NRANKS_PER_NODE*NUM_NODES))./${TMP_EXE}rm./${TMP_EXE}
Copy the script to Sophia and make it executable: chmod a+x ./dask_start.sh
On a compute node, load modules and activate the Dask conda environment created in Install RAPIDS and Dask
The following python script, dask_example.py, shows how to connect to a running Dask cluster, print the GPU uuid of each worker, and shut down the cluster: