Getting Started with Cerebras CSL¶

Cerebras CSL (Cerebras System Language) is a low-level kernel programming language designed for the Cerebras system. It enables users to write code that runs on individual Processing Elements (PEs) and to define the placement of programs and the routing of data on the Wafer-Scale Engine (WSE).

To develop programs for the Cerebras system, users create two main components:

Device Code: Written in CSL, this code executes directly on the Cerebras system.
Host Code: Written in Python, this code leverages Cerebras APIs to facilitate data movement and execute functions on the Cerebras system. CSL includes libraries for a variety of commonly used primitive operations, such as broadcasting, gathering, and scattering data across rows or columns of PEs.

The Cerebras SDK can be utilized in two primary modes: 1. Simulator Mode: For testing and debugging programs without access to physical hardware. 2. Appliance Mode: For executing programs on the actual Cerebras hardware.

For a comprehensive overview of the Cerebras SDK, refer to the Cerebras SDK Documentation.

SDK with Simulator¶

The Cerebras SDK relies on a Singularity container and associated scripts to execute CSL code on a simulator.

On the login node, the Cerebras SDK is available at /software/cerebras/cs_sdk for your convenience. You can copy it to your $HOME directory, add it to your $PATH, and you’re ready to get started.

cp -r /software/cerebras/cs_sdk-1.2.0 ~
export PATH=~/cs_sdk-1.2.0:$PATH

To verify that the SDK is installed correctly, execute the command: cslc --help

Examples¶

We will use examples from the csl-examples repository provided by Cerebras. To get these examples, clone the repository into your desired directory:

git clone https://github.com/Cerebras/csl-examples.git
cd csl-examples
git checkout rel-sdk-1.2.0
cd ~/csl-examples/benchmarks/gemm-collectives_2d
bash commands.sh

Sample Output

$ bash commands.sh
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
compile successful
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
SUCCESS

SDK GUI¶

You can use the SDK Debug GUI to analyze and gain insights into your code execution. For detailed instructions, refer to the SDK GUI documentation.

To launch the SDK Debug GUI, run the following commands:

cd ~/csl-examples/benchmarks/gemm-collectives_2d
sdk_debug_shell visualize

Sample Output

> sdk_debug_shell visualize
INFO: Using SIF: /software/cerebras/cs_sdk-1.2.0/sdk-cbcore-202406260214-3-f03c8e31.sif
INFO: User's specified CSL_IMPORT_PATH=
NOTE: CSL_IMPORT_PATH accepts colon separated list of paths generated by 'realpath <path>'
Click this link to open URL:  http://cer-login-02.ai.alcf.anl.gov:8000/sdk-gui
Click this link to open URL:  http://140.221.80.28:8000/sdk-gui
Press Ctrl-C to exit

To access the GUI from your local computer, forward port 8000 from the login node to your local machine and open the following URL in your web browser: http://localhost:8000/sdk-gui/

CS-2 connection diagram

SDK with Appliance Mode¶

Examples dont exit gracefully

With the release of the Cerebras SDK version 2.4.0, the examples in the below tutorial don't exit gracefully after successful execution. Use Ctrl+C to exit. A fix and updates are forthcoming.

Appliance Mode enables running code directly on the Cerebras Wafer-Scale Cluster. In addition to the containerized Singularity build of the Cerebras SDK, the SDK also supports operations on Cerebras Wafer-Scale Clusters running in appliance mode.

Setup¶

Create Virtual Environment: Follow these steps to set up the virtual environment for the Cerebras SDK:

rm -r cs_appliance_sdk
deactivate
/software/cerebras/python3.8/bin/python3.8 -m venv cs_appliance_sdk
source cs_appliance_sdk/bin/activate
pip install --upgrade pip

Install SDK Packages: Install the cerebras_appliance and cerebras_sdk Python packages in the virtual environment, specifying the appropriate Cerebras Software release:

pip install cerebras_appliance==2.4.0
pip install cerebras_sdk==2.4.0

Examples¶

We will use examples from the csl-examples repository provided by Cerebras. To access these examples, clone the repository into your desired directory:

git clone https://github.com/Cerebras/csl-examples.git
cd csl-examples
git checkout rel-sdk-1.3.0
cd ~/csl-examples/tutorials/gemv-01-complete-program/

Compile Code¶

Use the following appliance_compile.py script to compile the code in the respective example directory:

compile.py
import json
from cerebras.sdk.client import SdkCompiler

# Instantiate copmiler
compiler = SdkCompiler()

# Launch compile job
artifact_path = compiler.compile(
    ".",
    "layout.csl",
    "--fabric-dims=757,996 --fabric-offsets=4,1 --memcpy --channels=1 -o out",
    "."
)

# Write the artifact_path to a JSON file
with open("artifact_path.json", "w", encoding="utf8") as f:
    json.dump({"artifact_path": artifact_path,}, f)

Sample Output

$ python appliance_compile.py
2023-10-11 00:55:33,107 DEBUG    ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:55:33,128 INFO     Initiating a new SDK compile job against the cluster server
2023-10-11 00:55:33,142 DEBUG    Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:55:33,142 INFO     sdk_compile job id: wsjob-cgvksf7bv8mnfszfmftozv, log path: /n1/wsjob/workdir/wsjob-cgvksf7bv8mnfszfmftozv
2023-10-11 00:55:33,142 DEBUG    Starting heartbeat thread for wsjob-cgvksf7bv8mnfszfmftozv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:55:43,153 INFO     Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:56:13,186 INFO     Ingress is ready.
2023-10-11 00:56:13,231 WARNING  There is no existing compile job record.
2023-10-11 00:56:13,231 WARNING  There is no existing compile job record.
2023-10-11 00:56:13,231 DEBUG    Cluster mgmt job handle: {'job_id': 'wsjob-cgvksf7bv8mnfszfmftozv', 'service_authority': 'wsjob-cgvksf7bv8mnfszfmftozv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
2023-10-11 00:56:13,255 INFO     Application was found in the compile cache.
2023-10-11 00:56:13,266 DEBUG    Signaling heartbeat thread to stop for wsjob-cgvksf7bv8mnfszfmftozv

The only difference between CS-2 and simuator run is the fabric_dims. It should be set to minimum required for simulatored runs. Above script generates artifact.json which is used by the appliance_run.py script.

Run Code¶

Use the following appliance_run.py script to run the code in the respective example directory:

run.py
#!/usr/bin/env cs_python

import argparse
import json
import numpy as np

from cerebras.appliance.pb.sdk.sdk_common_pb2 import MemcpyDataType, MemcpyOrder
from cerebras.sdk.client import SdkRuntime


# Read the artifact_path from the JSON file
with open("artifact_path.json", "r", encoding="utf8") as f:
        data = json.load(f)
        artifact_path = data["artifact_path"]

# Matrix dimensions
M = 4
N = 6

# Construct A, x, b
A = np.arange(M*N, dtype=np.float32).reshape(M, N)
x = np.full(shape=N, fill_value=1.0, dtype=np.float32)
b = np.full(shape=M, fill_value=2.0, dtype=np.float32)

# Calculate expected y
y_expected = A@x + b

# Instantiate a runner object using a context manager.
# Set simulator=False if running on CS system within appliance.
with SdkRuntime(artifact_path, simulator=False) as runner:
    # Launch the init_and_compute function on device
    runner.launch('init_and_compute', nonblock=False)

    # Copy y back from device
    y_symbol = runner.get_id('y')
    y_result = np.zeros([1*1*M], dtype=np.float32)
    runner.memcpy_d2h(y_result, y_symbol, 0, 0, 1, 1, M, streaming=False,
    order=MemcpyOrder.ROW_MAJOR, data_type=MemcpyDataType.MEMCPY_32BIT, nonblock=False)

# Ensure that the result matches our expectation
np.testing.assert_allclose(y_result, y_expected, atol=0.01, rtol=0)
print("SUCCESS!")

Sample Output

$ python appliance_run.py
2023-10-11 00:56:21,281 DEBUG    ClusterClient: server=10.140.65.35:443, authority=cluster-server.cerebras1.lab.alcf.anl.gov, cert=/opt/cerebras/certs/tls.crt, client-lease-strategy=0, heartbeat_options=HeartBeatOptions(cycle_seconds=10, cycle_threshold=12, lease_duration_seconds_override=0), options=[('grpc.service_config', '{"methodConfig": [{"name": [{"service": "cluster.cluster_mgmt_pb.ClusterManagement"}], "retryPolicy": {"maxAttempts": 3, "initialBackoff": "3s", "maxBackoff": "10s", "backoffMultiplier": 2, "retryableStatusCodes": ["UNAVAILABLE"]}}]}'), ('grpc.enable_retries', 1), ('grpc.default_authority', 'cluster-server.cerebras1.lab.alcf.anl.gov')]
2023-10-11 00:56:21,304 INFO     Initiating a new SDK execute job against the cluster server
2023-10-11 00:56:21,316 DEBUG    Run meta is available at /srv/projects/datascience/sraskar/cs2/cs_sdk/gemv/run_meta.json.
2023-10-11 00:56:21,316 INFO     sdk_execute job id: wsjob-nm4gmz7jtq3khck2ltadbv, log path: /n1/wsjob/workdir/wsjob-nm4gmz7jtq3khck2ltadbv
2023-10-11 00:56:21,316 DEBUG    Starting heartbeat thread for wsjob-nm4gmz7jtq3khck2ltadbv. Heartbeat requests will be sent every 10 seconds.
2023-10-11 00:56:31,327 INFO     Poll ingress status: Waiting for coordinator to be ready.
2023-10-11 00:57:01,360 INFO     Ingress is ready.
2023-10-11 00:57:01,410 WARNING  There is no existing execute job record.
2023-10-11 00:57:01,410 WARNING  There is no existing execute job record.
2023-10-11 00:57:01,410 DEBUG    Cluster mgmt job handle: {'job_id': 'wsjob-nm4gmz7jtq3khck2ltadbv', 'service_authority': 'wsjob-nm4gmz7jtq3khck2ltadbv-coordinator-0.cluster-server.cerebras1.lab.alcf.anl.gov', 'service_url': '10.140.65.35:443', 'credentials_path': '/opt/cerebras/certs/tls.crt'}
SUCCESS!
2023-10-11 00:57:44,184 DEBUG    Signaling heartbeat thread to stop for wsjob-nm4gmz7jtq3khck2ltadbv