Accessing the Metis inference endpoint¶
The Sambanova SN40L cluster (Metis) is integrated as part of the ALCF inference service provided through API access to the models running on the Metis cluster. The models running on Metis can be accessed in two ways.
1. Web UI
2. API Access
Accessing the endpoints using the Web UI.¶
The easiest way to get started is through the web interface, accessible at https://inference.alcf.anl.gov/
The UI is based on the popular Open WebUI platform. After logging in with your ANL or ALCF credentials, you can:
- Select a model from the dropdown menu at the top of the screen.
- Start a conversation directly in the chat interface.
In the model selection dropdown, you can see the status of each model:

- Live: These models are "hot" and ready for immediate use.
- Starting: A node has been acquired and the model is being loaded into memory.
- Queued: The model is in a queue waiting for resources to become available.
- Offline: The model is available but not currently loaded. It will be queued for loading when a user sends a request.
- All: Lists all available models regardless of their status.
Accessing the endpoints using the API.¶
For programmatic access, you can use the API endpoints directly.
1. Setup Your Environment¶
You can run the following setup from any internet connected machine (your local machine, or an ALCF machine).
# Create a new Conda environment
conda create -n globus_env python==3.11.9 --y
conda activate globus_env
# Install necessary packages
pip install openai globus_sdk
2. Authenticate¶
To access the endpoints, you need an authentication token.
# Download the authentication helper script
wget https://raw.githubusercontent.com/argonne-lcf/inference-endpoints/refs/heads/main/inference_auth_token.py
# Authenticate with your Globus account
python inference_auth_token.py authenticate
This will generate and store access and refresh tokens in your home directory. To see how much time you have left before your access token expires, type the following command (units can be seconds, minutes, or hours):
Token Validity
- Access tokens are valid for 48 hours. The
get_access_tokencommand will automatically refresh your token if it has expired. - An internal policy requires re-authentication every 7 days. If you encounter permission errors, logout from Globus at app.globus.org/logout and re-run
python inference_auth_token.py authenticate --force.
3. Make a Test Call¶
Once authenticated, you can make a test call using cURL or Python.
#!/bin/bash
# Get your access token
access_token=$(python inference_auth_token.py get_access_token)
curl -X POST "https://inference-api.alcf.anl.gov/resource_server/metis/api/v1/chat/completions" \
-H "Authorization: Bearer ${access_token}" \
-H "Content-Type: application/json" \
-d '{
"model": "Meta-Llama-3.1-8B-Instruct",
"messages":[{"role": "user", "content": "Explain quantum computing in simple terms."}]
}'
from openai import OpenAI
from inference_auth_token import get_access_token
# Get your access token
access_token = get_access_token()
client = OpenAI(
api_key=access_token,
base_url="https://inference-api.alcf.anl.gov/resource_server/metis/api/v1"
)
response = client.chat.completions.create(
model="Meta-Llama-3.1-8B-Instruct",
messages=[{"role": "user", "content": "Explain quantum computing in simple terms."}]
)
print(response.choices[0].message.content)
Discovering Available Models
The endpoint information can be accessed using the Metis status page. It provides the status of the endpoints and the models and the associated configurations.
The list of currently supported chat-completion models on Metis are : - Meta-Llama-3.3-70B-Instruct - Meta-Llama-3.1-8B-Instruct - Qwen2.5-Coder-0.5B-Instruct - DeepSeek-R1
You can programmatically query all available models and endpoints:
access_token=$(python inference_auth_token.py get_access_token)
curl -X GET "https://inference-api.alcf.anl.gov/resource_server/list-endpoints" \
-H "Authorization: Bearer ${access_token}" | jq -C '.clusters.metis'
See SambaNova's documentation for additional information to supplement the instructions below: OpenAI compatible API.