Skip to content

Using an inference endpoint

Accessing your endpoint API keys

The endpoints running on the SambaStack SN40L cluster can be accessed from any internet-connected machine.

The API endpoint information, including user-unique API keys, is stored in the AI-testbed home directory for your ALCF ai-testbed account. These files may be accessed from the login nodes for ALCF's SambaNova SN30 cluster; access to the SN30 cluster and the inference endpoints running on the SN40L SambaRacks is linked. See ALCF Get Started to request an account and for additional information.

Log into the sambanova login node to access endpoint API keys

Log in to the SambaNova login node from your local machine using the below command. This uses the MobilePASS+ token generated every time you log in to the system. This is the same passcode used to authenticate into other ALCF systems, such as Polaris.

In the examples below, replace ALCFUserID with your personal ALCF user ID.

ssh ALCFUserID@sambanova.alcf.anl.gov
Password: < MobilePASS+ code >

Getting api information files from your home directory

Currently, there are two endpoints provisioned to users on the Metis SambaNova SN40L cluster. The endpoint description files are placed into your ai-testbed home directory, and are of form metis_endpoint_N.txt They include the following lines:

  • #DO NOT EDIT THIS FILE BY HAND - THIS FILE IS AUTOGENERATED - YOUR CHANGES WILL BE OVERWRITTEN
  • BASE_URL=https://metis.alcf.anl.gov/...
  • SAMBANOVA_API_KEY=...
  • MODELS=ModenName1,ModelName2,...

These endpoint description files may be copied to your working machine of choice, with e.g. scp, or text copy/paste.

The files can be sourced to set environment variables, e.g. if the endpoint were named "metis_endpoint_1.txt",

source ~/metis_endpoint_endpoint_1.txt

Endpoint "metis_endpoint_1" serves the "DeepSeek-R1" model. Endpoint "metis_endpoint_2" serves the "Llama-4-Maverick-17B-128E-Instruct" model. If you need any other models to be provisioned via these endpoints, please reach out to support[at]alcf.anl.gov.

See SambaNova's documentation for additional information to supplement the instructions below: OpenAI compatible API.

Generic Code examples

Using environment variables for endpoint url, api key, model name

The information files may be sourced to put the necessary values into environment variables. For example:

source [endpoint description file]
# copy the environment variables to what the openai package expects
export MODEL_NAME=<The name of the model you want to use>
export OPENAI_BASE_URL=$BASE_URL
export OPENAI_API_KEY=$SAMBANOVA_API_KEY

Python example

Make a virtual env and activate it, or use an existing virtualenv or conda env. You will need python 3.8 or newer.

virtualenv openai_venv
# Or specify the python version, e.g.
# virtualenv --system-site-packages -p python3.8 venv_p3.8
source openai_venv/bin/activate
Then install the openai package needed for chat completions
pip install openai
Write a python script that

  • imports the openai package
  • makes an openai client
  • calls the chat.completions.create method
  • extracts what is wanted from the response

Source one of the SN40L endpoint information files to set some environment variables, then copy them to the environment variables expected by the openai python package and the sample scripts below:

# source ~/metis_endpoint_<endpoint number>.txt
# e.g. 
source ~/metis_endpoint_1.txt
export OPENAI_BASE_URL=$BASE_URL
export OPENAI_API_KEY=$SAMBANOVA_API_KEY
echo $MODELS
export MODEL_NAME=<a name from above> # e.g. Llama-4-Maverick-17B-128E-Instruct

Here is a simple sample python script, that uses environment variables OPENAI_API_KEY, OPENAI_BASE_URL, and MODEL_NAME, and accepts a (quoted) prompt as a command line parameter:

import os
import openai
import sys

client = openai.OpenAI(
    api_key=os.environ.get("OPENAI_API_KEY"),
    base_url=os.environ.get("OPENAI_BASE_URL")
)

modelname = os.environ.get("MODEL_NAME")
query = sys.argv[1]
response = client.chat.completions.create(
    model=modelname,
    messages=[{"role":"user","content":query}],
    temperature =  0.1,
    top_p = 0.1
)

print(response.choices[0].message.content)

Pass the script a query as a command line parameter, e.g.

python chat_completion.py "What is an \"extinction vortex\"?"

curl example

Sample curl command. Change the PROMPT environment variable as desired.

# Sample prompt that shows quoting of quote marks
export PROMPT="What are \\\"telescoping generations\\\" in biology?"
export D='{
    "stream": false,
    "model": "'${MODEL_NAME}'",
    "messages": [
        {
            "role": "user",
            "content": '\"${PROMPT}\"'
        }
    ]
    }'
curl -H "Authorization: Bearer ${OPENAI_API_KEY}" \
     -H "Content-Type: application/json" \
     -d "${D}" \
     -X POST ${OPENAI_BASE_URL}/chat/completions

If jq is installed, it can be used to parse the json output; e.g. add -s | jq '{response: .choices[0].message.content} to the command line:

curl -H "Authorization: Bearer ${OPENAI_API_KEY}" \
  -H "Content-Type: application/json" \
  -X POST ${OPENAI_BASE_URL}/chat/completions \
  -d "${D}" \
  -s | jq '{response: .choices[0].message.content}'

Multiple completions can be requested in a single call by passing an array of requests e.g.

export PROMPT1="Why is red red?"
export PROMPT2="Why is green green?"
export D='[
{
    "stream": false,
    "model": "'${MODEL_NAME}'",
    "messages": [
        {
            "role": "user",
            "content": '\"${PROMPT1}\"'
        }
    ]
},
{
    "stream": false,
    "model": "'${MODEL_NAME}'",
    "messages": [
        {
            "role": "user",
            "content": '\"${PROMPT2}\"'           
        }
    ]
}
]'
curl -H "Authorization: Bearer ${OPENAI_API_KEY}" \
     -H "Content-Type: application/json" \
     -d "${D}" \
     -X POST ${OPENAI_BASE_URL}/chat/completions