Basic example on how to use the SLURM cluster

Basic example

When you’re running hundreds or thousands of jobs, automation is a necessity. This is where hopla can help you.

A simple example of how to use hopla on a SLURM cluster. Please check the user guide for a more in depth presentation of all functionalities.

Imports

import hopla
from pprint import pprint

Executor Context

executor = hopla.Executor(
    cluster="slurm",
    folder="/tmp/hopla",
    queue="Nspin_short",
    image="/tmp/hopla/my-apptainer-img.simg",
    walltime=1,
)

Submit Jobs

jobs = [
    executor.submit("sleep", k) for k in range(1, 11)
]
pprint(jobs)
print(jobs[0].delayed_submission)
[DelayedSlurmJob(
  job_id=1,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=2,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=3,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=4,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=5,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=6,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=7,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=8,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=9,
  submission_id=None,
),
 DelayedSlurmJob(
  job_id=10,
  submission_id=None,
)]
DelayedSubmission(
  command=sleep 1,
  execution_parameters=,
)

Generate a batch

jobs[0].generate_batch()
print(jobs[0].paths)
batch = jobs[0].paths.submission_file
with open(batch) as of:
    print(of.read())
JobPaths(
  flux_dir=/tmp/hopla/logs/1_flux,
  job_id=1,
  joblib_file=/tmp/hopla/submissions/1_joblib_script.py,
  log_folder=/tmp/hopla/logs,
  oneshot_dir=/tmp/hopla/logs/1_oneshot,
  oneshot_file=/tmp/hopla/submissions/1_oneshot_script.sh,
  stderr=/tmp/hopla/logs/1_log.err,
  stdout=/tmp/hopla/logs/1_log.out,
  submission_file=/tmp/hopla/submissions/1_submission.sh,
  submission_folder=/tmp/hopla/submissions,
  task_file=/tmp/hopla/submissions/1_tasks.txt,
  worker_file=/tmp/hopla/submissions/worker.sh,
)
#!/bin/bash

# Parameters
#SBATCH -p Nspin_short
#SBATCH --mem=2g
#SBATCH -c 1
#SBATCH --gres=gpu:0
#SBATCH --time=1:00:00
#SBATCH -J hopla
#SBATCH -e /tmp/hopla/logs/1_log.err
#SBATCH -o /tmp/hopla/logs/1_log.out

# Environment
echo $SLURM_JOB_ID
echo $HOSTNAME
unset LD_PRELOAD

# Command
apptainer run  /tmp/hopla/my-apptainer-img.simg sleep 1
exitcode=$?
echo "Exit code was: $exitcode"

# Exit
if [ "$exitcode" -ne 1 ]; then
    echo "HOPLASAY-DONE"
fi

Start Jobs

We can’t execute the code on the CI since the PBS infrastructure is not available.

from hopla.config import Config

with Config(dryrun=True, delay_s=3):
    executor(max_jobs=2)
    print(executor.report)
SBATCH:   0%|          | 0/10 [00:00<?, ?it/s][command] sbatch /tmp/hopla/submissions/1_submission.sh

SBATCH:  10%|█         | 1/10 [00:00<00:00, 1561.54it/s][command] sbatch /tmp/hopla/submissions/2_submission.sh

SBATCH:  20%|██        | 2/10 [00:00<00:00, 1651.30it/s][command] sbatch /tmp/hopla/submissions/3_submission.sh

SBATCH:  30%|███       | 3/10 [00:03<00:07,  1.00s/it]
SBATCH:  30%|███       | 3/10 [00:03<00:07,  1.00s/it][command] sbatch /tmp/hopla/submissions/4_submission.sh

SBATCH:  40%|████      | 4/10 [00:03<00:06,  1.00s/it][command] sbatch /tmp/hopla/submissions/5_submission.sh

SBATCH:  50%|█████     | 5/10 [00:06<00:06,  1.24s/it]
SBATCH:  50%|█████     | 5/10 [00:06<00:06,  1.24s/it][command] sbatch /tmp/hopla/submissions/6_submission.sh

SBATCH:  60%|██████    | 6/10 [00:06<00:04,  1.24s/it][command] sbatch /tmp/hopla/submissions/7_submission.sh

SBATCH:  70%|███████   | 7/10 [00:09<00:04,  1.35s/it]
SBATCH:  70%|███████   | 7/10 [00:09<00:04,  1.35s/it][command] sbatch /tmp/hopla/submissions/8_submission.sh

SBATCH:  80%|████████  | 8/10 [00:09<00:02,  1.35s/it][command] sbatch /tmp/hopla/submissions/9_submission.sh

SBATCH:  90%|█████████ | 9/10 [00:12<00:01,  1.41s/it]
SBATCH:  90%|█████████ | 9/10 [00:12<00:01,  1.41s/it][command] sbatch /tmp/hopla/submissions/10_submission.sh

SBATCH: 100%|██████████| 10/10 [00:12<00:00,  1.41s/it]
SBATCH: 100%|██████████| 10/10 [00:15<00:00,  1.50s/it]
----------------------------------------
DelayedSlurmJob<job_id=1>exitcode: failure
DelayedSlurmJob<job_id=1>submission: /tmp/hopla/submissions/1_submission.sh
DelayedSlurmJob<job_id=1>stdout: none
DelayedSlurmJob<job_id=1>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=2>exitcode: failure
DelayedSlurmJob<job_id=2>submission: /tmp/hopla/submissions/2_submission.sh
DelayedSlurmJob<job_id=2>stdout: none
DelayedSlurmJob<job_id=2>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=3>exitcode: failure
DelayedSlurmJob<job_id=3>submission: /tmp/hopla/submissions/3_submission.sh
DelayedSlurmJob<job_id=3>stdout: none
DelayedSlurmJob<job_id=3>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=4>exitcode: failure
DelayedSlurmJob<job_id=4>submission: /tmp/hopla/submissions/4_submission.sh
DelayedSlurmJob<job_id=4>stdout: none
DelayedSlurmJob<job_id=4>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=5>exitcode: failure
DelayedSlurmJob<job_id=5>submission: /tmp/hopla/submissions/5_submission.sh
DelayedSlurmJob<job_id=5>stdout: none
DelayedSlurmJob<job_id=5>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=6>exitcode: failure
DelayedSlurmJob<job_id=6>submission: /tmp/hopla/submissions/6_submission.sh
DelayedSlurmJob<job_id=6>stdout: none
DelayedSlurmJob<job_id=6>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=7>exitcode: failure
DelayedSlurmJob<job_id=7>submission: /tmp/hopla/submissions/7_submission.sh
DelayedSlurmJob<job_id=7>stdout: none
DelayedSlurmJob<job_id=7>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=8>exitcode: failure
DelayedSlurmJob<job_id=8>submission: /tmp/hopla/submissions/8_submission.sh
DelayedSlurmJob<job_id=8>stdout: none
DelayedSlurmJob<job_id=8>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=9>exitcode: failure
DelayedSlurmJob<job_id=9>submission: /tmp/hopla/submissions/9_submission.sh
DelayedSlurmJob<job_id=9>stdout: none
DelayedSlurmJob<job_id=9>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=10>exitcode: failure
DelayedSlurmJob<job_id=10>submission: /tmp/hopla/submissions/10_submission.sh
DelayedSlurmJob<job_id=10>stdout: none
DelayedSlurmJob<job_id=10>stderr: none

Total running time of the script: (0 minutes 15.230 seconds)

Estimated memory usage: 109 MB

Gallery generated by Sphinx-Gallery