Skip to content

Argonne Leadership Computing Facility

Miscellaneous

Porting applications to the CS-2

Cerebras’s mature Python support is built around Cerebras Estimator, which inherits from TensorFlow Estimator.
A Keras model can be converted to TF Estimator, and hence to a Cerebras Estimator. See https://www.tensorflow.org/tutorials/estimator/keras_model_to_estimator for more information on conversion of Keras models.

Cerebras has recently introduced PyTorch support. The PyTorch support is documented at Cerebras Software Documentation in the section DEVELOP WITH PYTORCH.

Cerebras has guides for porting TensorFlow and PyTorch models:
Port TensorFlow to Cerebras
Porting PyTorch Model to CS
This is Cerebras's list of the TensorFlow layers that they support (for the current version): Supported TensorFlow Layers
This is Cerebras's list of the PyTorch operations supported (for the current version): Supported PyTorch Ops

When porting, it is often helpful to study a related example in the Cerebras modelzoo.
A copy of the modelzoo for the install release is at /software/cerebras/model_zoo/modelzoo/
Both the README.md files and source code in the modelzoo can be quite helpful.

Determining the CS-2 version

These queries will only work on cs2-01 due to networking constraints:

...$ # Query the firmware level for cs2-01
...$ curl -k -X GET 'https://192.168.120.30/redfish/v1/Managers/manager' --header 'Authorization: Basic YWRtaW46YWRtaW4=' 2> /dev/null  | python -m json.tool | grep FirmwareVersion
"FirmwareVersion": "1.1.1-202203171919-5-879ff4ef",
...$

...$ # Query the firmware level for cs2-02 (from cs2-01)
...$ curl -k -X GET 'https://192.168.120.50/redfish/v1/Managers/manager' --header 'Authorization: Basic YWRtaW46YWRtaW4=' 2> /dev/null  | python -m json.tool | grep FirmwareVersion
"FirmwareVersion": "1.1.1-202203171919-5-879ff4ef",
...$

Copying files

To copy a file to your CS-2 home dir (same on both CS2 clusters), replacing both instances of ALCFUserID with your ALCF user id:

scp -o "ProxyJump ALCFUserID@cerebras.alcf.anl.gov" filename ALCFUserID@cs2-01-master:~/

To copy a file from your CS-2 home dir (same on both CS2 clusters) to the current local directory, replacing both instances of ALCFUserID with your ALCF user id:

scp -o "ProxyJump ALCFUserID@cerebras.alcf.anl.gov" ALCFUserID@cs2-01-master:~/filename .

Downloading a Kaggle competition dataset to a CS-2 node using the command line

These notes may be helpful for downloading some Kaggle datasets

Inside a singularity shell (e.g. singularity shell -B /opt:/opt /software/cerebras/cs2-02/container/cbcore_latest.sif )

virtualenv env
source env/bin/activate
pip3 install kaggle

Go to www.kaggle.com in a browser, log in (create account if first time). In user(icon upper right) -> Account tab, there is a button (scroll down) to "Create New API Token". Click it. It will open a download window for a one line json.

put the json in ~/.kaggle/kaggle.json
e.g. scp the downloaded file, or single quote the json text and echo it as shown

mkdir ~/.kaggle
echo '{"username":"REDACTED","key":"REDACTED"}' > ~/.kaggle/kaggle.json
chmod 600 ~/.kaggle/kaggle.json

On www.kaggle.com, the kaggle api command for download of a dataset is displayed in the data tab. It can be selected and copied to the local clipboard, or copied with the "Copy API command to clipboard" icon.
Before attempting a download, if there is a button on the kaggle download page to agree to any terms and conditions, e.g. agreement to the competition rules, click on it (after reading them); downloads with your access token will fail with a 403 error until you agree to those T&Cs.

Paste the API command to the command line inside the singularity shell with the venv activated. E.g.

kaggle datasets download -d mhskjelvareid/dagm-2007-competition-dataset-optical-inspection

It will download as a zip file.

Exit the singularity container (with exit), then unzip the dataset zip file.
unzip is available on the CS2 worker nodes.

Note: the kaggle download shown above included two identical copies of the dataset; one copy was in a subdirectory.