Note
Go to the end to download the full example code.
Basic example on how to use the SLURM cluster¶
Basic example
When you’re running hundreds or thousands of jobs, automation is a necessity.
This is where hopla can help you.
A simple example of how to use hopla on a SLURM cluster. Please check
the user guide for a more in depth presentation of all
functionalities.
Imports¶
import hopla
from pprint import pprint
Executor Context¶
executor = hopla.Executor(
cluster="slurm",
folder="/tmp/hopla",
queue="Nspin_short",
image="/tmp/hopla/my-apptainer-img.simg",
walltime=1,
)
Submit Jobs¶
jobs = [
executor.submit("sleep", k) for k in range(1, 11)
]
pprint(jobs)
print(jobs[0].delayed_submission)
[DelayedSlurmJob(
job_id=1,
submission_id=None,
),
DelayedSlurmJob(
job_id=2,
submission_id=None,
),
DelayedSlurmJob(
job_id=3,
submission_id=None,
),
DelayedSlurmJob(
job_id=4,
submission_id=None,
),
DelayedSlurmJob(
job_id=5,
submission_id=None,
),
DelayedSlurmJob(
job_id=6,
submission_id=None,
),
DelayedSlurmJob(
job_id=7,
submission_id=None,
),
DelayedSlurmJob(
job_id=8,
submission_id=None,
),
DelayedSlurmJob(
job_id=9,
submission_id=None,
),
DelayedSlurmJob(
job_id=10,
submission_id=None,
)]
DelayedSubmission(
command=sleep 1,
execution_parameters=,
)
Generate a batch¶
JobPaths(
flux_dir=/tmp/hopla/logs/1_flux,
job_id=1,
joblib_file=/tmp/hopla/submissions/1_joblib_script.py,
log_folder=/tmp/hopla/logs,
oneshot_dir=/tmp/hopla/logs/1_oneshot,
oneshot_file=/tmp/hopla/submissions/1_oneshot_script.sh,
stderr=/tmp/hopla/logs/1_log.err,
stdout=/tmp/hopla/logs/1_log.out,
submission_file=/tmp/hopla/submissions/1_submission.sh,
submission_folder=/tmp/hopla/submissions,
task_file=/tmp/hopla/submissions/1_tasks.txt,
worker_file=/tmp/hopla/submissions/worker.sh,
)
#!/bin/bash
# Parameters
#SBATCH -p Nspin_short
#SBATCH --mem=2g
#SBATCH -c 1
#SBATCH --gres=gpu:0
#SBATCH --time=1:00:00
#SBATCH -J hopla
#SBATCH -e /tmp/hopla/logs/1_log.err
#SBATCH -o /tmp/hopla/logs/1_log.out
# Environment
echo $SLURM_JOB_ID
echo $HOSTNAME
unset LD_PRELOAD
# Command
apptainer run /tmp/hopla/my-apptainer-img.simg sleep 1
exitcode=$?
echo "Exit code was: $exitcode"
# Exit
if [ "$exitcode" -ne 1 ]; then
echo "HOPLASAY-DONE"
fi
Start Jobs¶
We can’t execute the code on the CI since the PBS infrastructure is not available.
from hopla.config import Config
with Config(dryrun=True, delay_s=3):
executor(max_jobs=2)
print(executor.report)
SBATCH: 0%| | 0/10 [00:00<?, ?it/s][command] sbatch /tmp/hopla/submissions/1_submission.sh
SBATCH: 10%|█ | 1/10 [00:00<00:00, 1561.54it/s][command] sbatch /tmp/hopla/submissions/2_submission.sh
SBATCH: 20%|██ | 2/10 [00:00<00:00, 1651.30it/s][command] sbatch /tmp/hopla/submissions/3_submission.sh
SBATCH: 30%|███ | 3/10 [00:03<00:07, 1.00s/it]
SBATCH: 30%|███ | 3/10 [00:03<00:07, 1.00s/it][command] sbatch /tmp/hopla/submissions/4_submission.sh
SBATCH: 40%|████ | 4/10 [00:03<00:06, 1.00s/it][command] sbatch /tmp/hopla/submissions/5_submission.sh
SBATCH: 50%|█████ | 5/10 [00:06<00:06, 1.24s/it]
SBATCH: 50%|█████ | 5/10 [00:06<00:06, 1.24s/it][command] sbatch /tmp/hopla/submissions/6_submission.sh
SBATCH: 60%|██████ | 6/10 [00:06<00:04, 1.24s/it][command] sbatch /tmp/hopla/submissions/7_submission.sh
SBATCH: 70%|███████ | 7/10 [00:09<00:04, 1.35s/it]
SBATCH: 70%|███████ | 7/10 [00:09<00:04, 1.35s/it][command] sbatch /tmp/hopla/submissions/8_submission.sh
SBATCH: 80%|████████ | 8/10 [00:09<00:02, 1.35s/it][command] sbatch /tmp/hopla/submissions/9_submission.sh
SBATCH: 90%|█████████ | 9/10 [00:12<00:01, 1.41s/it]
SBATCH: 90%|█████████ | 9/10 [00:12<00:01, 1.41s/it][command] sbatch /tmp/hopla/submissions/10_submission.sh
SBATCH: 100%|██████████| 10/10 [00:12<00:00, 1.41s/it]
SBATCH: 100%|██████████| 10/10 [00:15<00:00, 1.50s/it]
----------------------------------------
DelayedSlurmJob<job_id=1>exitcode: failure
DelayedSlurmJob<job_id=1>submission: /tmp/hopla/submissions/1_submission.sh
DelayedSlurmJob<job_id=1>stdout: none
DelayedSlurmJob<job_id=1>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=2>exitcode: failure
DelayedSlurmJob<job_id=2>submission: /tmp/hopla/submissions/2_submission.sh
DelayedSlurmJob<job_id=2>stdout: none
DelayedSlurmJob<job_id=2>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=3>exitcode: failure
DelayedSlurmJob<job_id=3>submission: /tmp/hopla/submissions/3_submission.sh
DelayedSlurmJob<job_id=3>stdout: none
DelayedSlurmJob<job_id=3>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=4>exitcode: failure
DelayedSlurmJob<job_id=4>submission: /tmp/hopla/submissions/4_submission.sh
DelayedSlurmJob<job_id=4>stdout: none
DelayedSlurmJob<job_id=4>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=5>exitcode: failure
DelayedSlurmJob<job_id=5>submission: /tmp/hopla/submissions/5_submission.sh
DelayedSlurmJob<job_id=5>stdout: none
DelayedSlurmJob<job_id=5>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=6>exitcode: failure
DelayedSlurmJob<job_id=6>submission: /tmp/hopla/submissions/6_submission.sh
DelayedSlurmJob<job_id=6>stdout: none
DelayedSlurmJob<job_id=6>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=7>exitcode: failure
DelayedSlurmJob<job_id=7>submission: /tmp/hopla/submissions/7_submission.sh
DelayedSlurmJob<job_id=7>stdout: none
DelayedSlurmJob<job_id=7>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=8>exitcode: failure
DelayedSlurmJob<job_id=8>submission: /tmp/hopla/submissions/8_submission.sh
DelayedSlurmJob<job_id=8>stdout: none
DelayedSlurmJob<job_id=8>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=9>exitcode: failure
DelayedSlurmJob<job_id=9>submission: /tmp/hopla/submissions/9_submission.sh
DelayedSlurmJob<job_id=9>stdout: none
DelayedSlurmJob<job_id=9>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=10>exitcode: failure
DelayedSlurmJob<job_id=10>submission: /tmp/hopla/submissions/10_submission.sh
DelayedSlurmJob<job_id=10>stdout: none
DelayedSlurmJob<job_id=10>stderr: none
Total running time of the script: (0 minutes 15.230 seconds)
Estimated memory usage: 109 MB