SmartSim and SmartRedis¶
SmartSim is an open-source tool developed by Hewlett Packard Enterprise (HPE) designed to facilitate the integration of traditional HPC simulation applications with machine learning workflows. There are two core components to SmartSim:
- Infrastructure Library (IL)
- Provides an API to start, stop, and monitor HPC applications from Python
- Interfaces with the PBSpro scheduler to launch jobs
- Deploys a distributed in-memory database called the Orchestrator
- SmartRedis Client Library
- Provides clients that connect to the Orchestrator from Fortran, C, C++, and Python code
- The client API library enables data transfer to/from the database and the ability to load and run JIT-traced Python and ML runtimes acting on stored data
For more resources on SmartSim, follow the links below:
Installation¶
Create a Python virtual environment based on the ML frameworks module:
module load frameworks
python -m venv --clear /path/to/_ssim_env --system-site-packages
source /path/to/_ssim_env/bin/activate
It is recommended that the venv is installed in a user's project space on the Flare parallel file system.
Install SmartRedis from source:
Install SmartSim and the CPU backend for RedisAI from source (Intel GPU are not supported):
git clone https://github.com/CrayLabs/SmartSim.git
cd SmartSim
pip install -e .
# Can disregard package compatibility errors
export TORCH_CMAKE_PATH=$( python -c 'import torch;print(torch.utils.cmake_prefix_path)' )
export TORCH_PATH=$( python -c 'import torch; print(torch.__path__[0])' )
export LD_LIBRARY_PATH=$TORCH_PATH/lib:$LD_LIBRARY_PATH
curl -O https://gist.githubusercontent.com/rickybalin/fcf1d15a26dbbc120f42943041ada827/raw/e22485d53250b8a29ead537533bca7c8f229c362/aurora_config.patch
git apply aurora_config.patch
smart build -v --device cpu --skip-tensorflow --skip-onnx
smart validate
cd ..
Running with SmartSim
When running a workload with SmartSim, please include the following in your run or submit scripts:
Known Issues
- Pip installing SmartSim returns some warnings which can be safely ignored.
- The
smart build -v --device cpucommand builds the RedisAI backend for the CPU. This enables ML model inferencing on the CPU with SmartSim and SmartRedis. Due to a limitation with RedisAI, the backend cannot be built for the Intel Max 1550 GPU. - The instructions focus on PyTorch workloads, thus
--skip-tensorflow --skip-onnxare used. If you need the TensorFlow backend, please contact us at support@alcf.anl.gov. - The patch is needed to make sure the RedisAI installation uses the PyTorch installation provided in the frameworks module instead of installing a new one.