Support for the Julia programming language on Polaris is currently experimental. This guide provides a set of best practices, but you may encounter unexpected issues.
Julia is a high-level, high-performance programming language designed for technical and scientific computing. It combines the ease of use of dynamic languages with the performance of compiled languages, making it well-suited for large-scale simulations and data analysis.
This guide details how to install, configure, and run Julia on the Polaris supercomputer, focusing on leveraging the system's key architectural features for large-scale parallel and GPU-accelerated computing.
Contributing
This guide is a first draft of the Julia documentation for Polaris. If you have suggestions or find errors, please open a pull request or contact us by opening a ticket at the ALCF Helpdesk.
To leverage Polaris's architecture, you must configure Julia to use the system's optimized libraries for MPI.jl, CUDA.jl, and HDF5.jl. For a modern, interactive development experience, we recommend using Visual Studio Code with the official Julia and Remote - SSH extensions.
Installing all required packages can be done in a Julia REPL with the following commands:
The packages will be loaded with the options specified in the LocalPreferences.toml file created during the setup process in $JULIA_DEPOT_PATH/environments/v1.12.
# Request an interactive nodeqsub-I-lselect=1,walltime=1:00:00,filesystems=home:eagle-A[PROJECT]-qdebug
# Once on the node, run the verificationjulia--project-e"using CUDA; CUDA.versioninfo()"# Expected Output Snippet# CUDA runtime 12.2, local installation# ...# Preferences:# - CUDA_Runtime_jll.local: true# ...# 4 devices:# 0: NVIDIA A100-SXM4-40GB ...
usingCUDAusingHDF5usingMPIusingPrintfusingRandom# GPU kernel to check if points fall within a circlefunctionpi_kernel(x,y,d,n)idx=(blockIdx().x-1)*blockDim().x+threadIdx().xifidx<=nd[idx]=(x[idx]-0.5)^2+(y[idx]-0.5)^2<=0.25?1:0endreturnnothingend# Function to run the computation on a single GPUfunctionapproximate_pi_gpu(n::Integer)x=CUDA.rand(Float64,n)y=CUDA.rand(Float64,n)d=CUDA.zeros(Float64,n)nblocks=ceil(Int64,n/32)@cudathreads=32blocks=nblockspi_kernel(x,y,d,n)returnsum(d)endfunctionmain()n=100_000# Number of points per MPI rank# Use a fixed random seed for reproducibilityRandom.seed!(1234+MPI.Comm_rank(MPI.COMM_WORLD))# Each rank computes its sum on the GPU, then we reduce across all rankslocal_sum=approximate_pi_gpu(n)total_sum=MPI.Allreduce(local_sum,MPI.SUM,MPI.COMM_WORLD)# Calculate final approximationcomm_size=MPI.Comm_size(MPI.COMM_WORLD)pi_approx=(4*total_sum)/(n*comm_size)ifMPI.Comm_rank(MPI.COMM_WORLD)==0@printf"Approximation of π: %.10f\n"pi_approx@printf"Error: %.10f\n"abs(pi_approx-π)endreturnpi_approxend# --- Main Execution ---MPI.Init()# Ensure the script doesn't run in an interactive Julia session without MPIif!isinteractive()pi_approx=main()# Rank 0 writes the result to an HDF5 fileifMPI.Comm_rank(MPI.COMM_WORLD)==0h5open("pi_approximation.h5","w")dofilewrite(file,"pi",pi_approx)endendMPI.Finalize()end