Skip to content

The Python API

The documentation alludes to the Balsam Python API in several places. For instance, the Managing Jobs section gives real-world examples of API usage in creating, querying, and updating Jobs. In this section, we take a step back and look more generally at the structure and semantics of Balsam's Python API. This is useful because all the Python API resources (Job, BatchJob, EventLog) share the same methods and data structures for creating, querying, updating, and deleting resources.

The first thing to understand is that the Python API is merely a convenience layer built on top of a standard HTTP requests client. One could bypass balsam.api altogether and interact with the Balsam REST API using another programming language or a command-line tool like curl.

Resources are imported at the top-level from balsam.api:

from balsam.api import (
    Site,
    ApplicationDefinition,
    Job,
    BatchJob,
    TransferItem,
    EventLog,
)

The following sections use Job as an example but easily generalize to any of the resources imported above. For example, querying your Sites looks just like querying your Jobs:

from datetime import datetime, timedelta

hour_ago = datetime.utcnow() - timedelta(hours=1)

recently_used_sites = Site.objects.filter(last_refresh_after=hour_ago)
recently_updated_jobs = Job.objects.filter(last_update_after=hour_ago)

Once again, the docstrings and type annotations visible in a Python IDE are hugely helpful in discovering the allowed parameters for various resources. Best of all, because the Python API is dynamically generated from the REST API schema, the Python docstrings and type hints stay up to date, even if this user guide lags behind!

Model Fields

Each of the API resources has a model class (like Job) which defines a set of Fields that can be exchanged with the REST API. The model fields are type-annotated and can be explored from your IDE or class docstrings. For example, we can start searching for "param" and find that job.parameters should be a dictionary of strings:

job-ide

Relationship to Django ORM

Previous versions of Balsam used the real Django ORM to communicate with a private user database. This proved to be an effective programming model, and so the new Balsam Python API was written to preserve a subset of the structure and syntax of the former API.

If you peek below the surface, the current Balsam Python API is completely different from an ORM. Whereas ORMs lazily execute SQL queries over a database connection, the Balsam API executes HTTPS requests over an Internet connection. It wraps one specific REST API schema and is therefore vastly narrower in scope and capabilities.

Creating

Each resource has three methods of construction. First, you can create several in-memory instances, and later persist them to the backend:

# Create in-memory with the required kwargs...
j = Job(...)

# New in-memory resources have no ID:
assert j.id is None

# ...then persist:
j.save()
assert j.id is not None

Second, you can create and persist in a single step:

j = Job.objects.create(...)
assert j.id is not None

Third, you can bulk-create from a collection of in-memory resources. This is far more efficient for creating large numbers of jobs with fewer API round trips:

jobs = [Job(**kwargs) for kwargs in job_specs]

# Capture the return value to get created IDs!
jobs = Job.objects.bulk_create(jobs) 
for job in jobs:
    assert job.id is not None

bulk_create does not modify objects in place!

When passing a list of items into bulk_create(), you must use the returned value to overwrite the input list with the newly-created items. This is necessary to set the ID on each item as generated by the server. Otherwise, the created items will have id == None and generally behave like objects that have never been saved to the API.

Updating

If we change some fields on an existing instance, we can update it by calling save(). The Python API is aware that if the resource id is set, you mean to update an existing resource rather than create a new one.

job.state = "RESTART_READY"
job.save()

If want to load recent changes to an in-memory resource:

# Re-fetch the Job & update fields:
job.refresh_from_db()

We can apply the same change to every item matching a particular Query:

# Select all FAILED jobs, change num_nodes=1, and mark for retry
Job.objects.filter(state="FAILED").update(num_nodes=1, state="RESTART_READY")

If we need to perform a large list of different updates, we can pass a list of mutated instances:

# Mutated jobs to be updated, each in their own way:
Job.objects.bulk_update(modified_jobs)

Deleting

We can delete individual resources:

job.delete()
assert job.id is None

Or we can bulk-delete querysets:

Job.objects.filter(state="FAILED").delete()

Querying

Each class has a manager (for instance Job.objects is a JobManager) which generates Query objects.

All

We can start with a query returning all items with all()

for job in Job.objects.all():
    print(job)

Filter

We can chain queries one-after-another by providing additional sets of filter parameters:

# These 3 chained queries...
all_jobs = Job.objects.all()
foo_jobs = all_jobs.filter(tags={"experiment": "foo"})
failed_foo_jobs = foo_jobs.filter(state="FAILED")

# ...are equivalent to this 1 query:
failed_foo_jobs = Job.objects.filter(
    tags={"experiment": "foo"},
    state="FAILED"
)

Order By and Slicing

When there are hundreds of thousands of Jobs matching your query, it makes sense to order on some criterion and take chunks with the slicing operator:

# 1000 most recent failures
Job.objects.filter(state="FAILED").order_by("-last_update")[0:1000]

Under the hood, the [0:1000] slice operation adds limit and offset to the HTTP query parameters, generating an efficient request that does not fetch more data than you asked for!

Get

If our query should return exactly one object, we can use get() instead of filter() to return the object directly. This method raises a model-scoped DoesNotExist error if the query returned 0 items, or MultipleObjectsReturned if more than 1 item was found. We can catch these errors that arise when when exactly one unique object is expected:

try:
    h2o_job = Job.objects.get(tags={"system": "H2O"})
except Job.DoesNotExist:
    print("There is no finished H2O job!")
except Job.MultipleObjectsReturned:
    print("There is more than one finished H2O job!")

Count

We can use count() to fetch the number of items matching a query, without actually fetching the list of items. This can be useful to quickly tally up large numbers of Jobs:

for state in ["RUNNING", "JOB_FINISHED"]:
    count = Job.objects.filter(state=state).count()
    print(state, count)

First

To grab the first item from a query we can use either syntax:

one = Job.objects.filter(state="RUNNING")[0]

same_thing = Job.objects.filter(state="RUNNING").first()

Lazy Query Evaluation

We can build queries with filter, chain them together, add order_by clauses, apply [start:end] slices, and store these queries in Python variables. None of these actions fetches data because queries are lazily evaluated.

# This doesn't fetch any Jobs yet:
failed_jobs = Job.objects.filter(state="FAILED")

When a query is evaluated by iteration or some other method, its result is cached, so that repeated iterations over the same query variable do not trigger redundant requests.

# This triggers the HTTP request:
for job in failed_jobs:
    print(job)

# This re-uses the cached result:
for job in failed_jobs:
    print("Again!", job)

# We can explicitly force iteration and store the result:
running_jobs = list(Job.objects.filter(state="RUNNING"))

Length

Evaluating the len(query) forces fetching the result set and returns its length. If you only want the count without fetching the items, its more efficient to use the count() method mentioned above.

Boolean Value

Evaluating the query as a Boolean expression (e.g. in an if statement like if query:) also triggers evaluation, and the query evaluates to True if there is at least one object in the result set; it's False otherwise.