Note
Go to the end to download the full example code.
Basic example on how to use the SLURM cluster¶
CCC-based cluster - SLURM
When you’re running hundreds or thousands of jobs, automation is a necessity.
This is where hopla can help you.
A simple example of how to use hopla on a SLURM cluster. Please check
the user guide for a more in depth presentation of all
functionalities.
Imports¶
import hopla
from pprint import pprint
Executor Context¶
executor = hopla.Executor(
cluster="slurm",
folder="/tmp/hopla",
queue="Nspin_short",
image="/tmp/hopla/my-apptainer-img.simg",
walltime=1,
)
Submit Jobs¶
jobs = [
executor.submit("sleep", k) for k in range(1, 11)
]
pprint(jobs)
print(jobs[0].delayed_submission)
[DelayedSlurmJob(
job_id=1,
submission_id=None,
),
DelayedSlurmJob(
job_id=2,
submission_id=None,
),
DelayedSlurmJob(
job_id=3,
submission_id=None,
),
DelayedSlurmJob(
job_id=4,
submission_id=None,
),
DelayedSlurmJob(
job_id=5,
submission_id=None,
),
DelayedSlurmJob(
job_id=6,
submission_id=None,
),
DelayedSlurmJob(
job_id=7,
submission_id=None,
),
DelayedSlurmJob(
job_id=8,
submission_id=None,
),
DelayedSlurmJob(
job_id=9,
submission_id=None,
),
DelayedSlurmJob(
job_id=10,
submission_id=None,
)]
DelayedSubmission(
command=sleep 1,
execution_parameters=,
)
Generate a batch¶
JobPaths(
flux_dir=/tmp/hopla/logs/1_flux,
job_id=1,
joblib_file=/tmp/hopla/submissions/1_joblib_script.py,
log_folder=/tmp/hopla/logs,
oneshot_dir=/tmp/hopla/logs/1_oneshot,
oneshot_file=/tmp/hopla/submissions/1_oneshot_script.sh,
stderr=/tmp/hopla/logs/1_log.err,
stdout=/tmp/hopla/logs/1_log.out,
submission_file=/tmp/hopla/submissions/1_submission.sh,
submission_folder=/tmp/hopla/submissions,
task_file=/tmp/hopla/submissions/1_tasks.txt,
worker_file=/tmp/hopla/submissions/worker.sh,
)
#!/bin/bash
# Parameters
#SBATCH -p Nspin_short
#SBATCH --mem=2g
#SBATCH -c 1
#SBATCH --gres=gpu:0
#SBATCH --time=1:00:00
#SBATCH -J hopla
#SBATCH -e /tmp/hopla/logs/1_log.err
#SBATCH -o /tmp/hopla/logs/1_log.out
# Environment
echo $SLURM_JOB_ID
echo $HOSTNAME
unset LD_PRELOAD
# Command
apptainer run /tmp/hopla/my-apptainer-img.simg sleep 1
exitcode=$?
echo "Exit code was: $exitcode"
# Exit
if [ "$exitcode" -ne 1 ]; then
echo "HOPLASAY-DONE"
fi
Start Jobs¶
We can’t execute the code on the CI since the PBS infrastructure is not available.
from hopla.config import Config
with Config(dryrun=True, delay_s=3):
executor(max_jobs=2)
print(executor.report)
SBATCH: 0%| | 0/10 [00:00<?, ?it/s][command] sbatch /tmp/hopla/submissions/1_submission.sh
SBATCH: 10%|█ | 1/10 [00:00<00:00, 1328.99it/s][command] sbatch /tmp/hopla/submissions/2_submission.sh
SBATCH: 20%|██ | 2/10 [00:00<00:00, 1451.07it/s][command] sbatch /tmp/hopla/submissions/3_submission.sh
SBATCH: 30%|███ | 3/10 [00:03<00:07, 1.00s/it]
SBATCH: 30%|███ | 3/10 [00:03<00:07, 1.00s/it][command] sbatch /tmp/hopla/submissions/4_submission.sh
SBATCH: 40%|████ | 4/10 [00:03<00:06, 1.00s/it][command] sbatch /tmp/hopla/submissions/5_submission.sh
SBATCH: 50%|█████ | 5/10 [00:06<00:06, 1.24s/it]
SBATCH: 50%|█████ | 5/10 [00:06<00:06, 1.24s/it][command] sbatch /tmp/hopla/submissions/6_submission.sh
SBATCH: 60%|██████ | 6/10 [00:06<00:04, 1.24s/it][command] sbatch /tmp/hopla/submissions/7_submission.sh
SBATCH: 70%|███████ | 7/10 [00:09<00:04, 1.35s/it]
SBATCH: 70%|███████ | 7/10 [00:09<00:04, 1.35s/it][command] sbatch /tmp/hopla/submissions/8_submission.sh
SBATCH: 80%|████████ | 8/10 [00:09<00:02, 1.35s/it][command] sbatch /tmp/hopla/submissions/9_submission.sh
SBATCH: 90%|█████████ | 9/10 [00:12<00:01, 1.41s/it]
SBATCH: 90%|█████████ | 9/10 [00:12<00:01, 1.41s/it][command] sbatch /tmp/hopla/submissions/10_submission.sh
SBATCH: 100%|██████████| 10/10 [00:12<00:00, 1.41s/it]
SBATCH: 100%|██████████| 10/10 [00:15<00:00, 1.50s/it]
----------------------------------------
DelayedSlurmJob<job_id=1>exitcode: failure
DelayedSlurmJob<job_id=1>submission: /tmp/hopla/submissions/1_submission.sh
DelayedSlurmJob<job_id=1>stdout: none
DelayedSlurmJob<job_id=1>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=2>exitcode: failure
DelayedSlurmJob<job_id=2>submission: /tmp/hopla/submissions/2_submission.sh
DelayedSlurmJob<job_id=2>stdout: none
DelayedSlurmJob<job_id=2>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=3>exitcode: failure
DelayedSlurmJob<job_id=3>submission: /tmp/hopla/submissions/3_submission.sh
DelayedSlurmJob<job_id=3>stdout: none
DelayedSlurmJob<job_id=3>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=4>exitcode: failure
DelayedSlurmJob<job_id=4>submission: /tmp/hopla/submissions/4_submission.sh
DelayedSlurmJob<job_id=4>stdout: none
DelayedSlurmJob<job_id=4>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=5>exitcode: failure
DelayedSlurmJob<job_id=5>submission: /tmp/hopla/submissions/5_submission.sh
DelayedSlurmJob<job_id=5>stdout: none
DelayedSlurmJob<job_id=5>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=6>exitcode: failure
DelayedSlurmJob<job_id=6>submission: /tmp/hopla/submissions/6_submission.sh
DelayedSlurmJob<job_id=6>stdout: none
DelayedSlurmJob<job_id=6>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=7>exitcode: failure
DelayedSlurmJob<job_id=7>submission: /tmp/hopla/submissions/7_submission.sh
DelayedSlurmJob<job_id=7>stdout: none
DelayedSlurmJob<job_id=7>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=8>exitcode: failure
DelayedSlurmJob<job_id=8>submission: /tmp/hopla/submissions/8_submission.sh
DelayedSlurmJob<job_id=8>stdout: none
DelayedSlurmJob<job_id=8>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=9>exitcode: failure
DelayedSlurmJob<job_id=9>submission: /tmp/hopla/submissions/9_submission.sh
DelayedSlurmJob<job_id=9>stdout: none
DelayedSlurmJob<job_id=9>stderr: none
----------------------------------------
DelayedSlurmJob<job_id=10>exitcode: failure
DelayedSlurmJob<job_id=10>submission: /tmp/hopla/submissions/10_submission.sh
DelayedSlurmJob<job_id=10>stdout: none
DelayedSlurmJob<job_id=10>stderr: none
Total running time of the script: (0 minutes 15.262 seconds)
Estimated memory usage: 109 MB