The cluster provides different partitions (“queues”) for running jobs. We have a ‘common’ partition that anyone is free to use as well as lab owned “condo” partitions that are restricted to a particular lab’s use. Any lab is free to purchase compute hardware and we will be glad to create a “condo” partition for it.
All partitions are configured identically with the exception of Maximum CPUs per user:
Comment: Here “runtime” means “walltime”, i.e. the runtime of a job is how long it runs according to the clock on the wall, not the amount of CPU time.
There is no need to specify the partition when submitting a job. The scheduler is configured to prioritize any lab-specific partitions you have access to. If you do not have access to a lab-specific partition, or they are already full, then the ‘common’ partition is considered. You can see which partitions you have access to by looking at environment variable SBATCH_PARTITION
For example, user alice
sees:
[alice@c4-log1 ~]$ echo "$SBATCH_PARTITION"
boblab,common
which means their next job will be sent to the ‘boblab’ partitions and if that is full, then the job is sent to the ‘common’ partition. If that is also full, the job will be pending and either ‘boblab’ or ‘common’ will be used as they become available.
If you would like to send a job to a specific partition, the Slurm option --partition
can be used, e.g. sbatch --partition=boblab script.sh
or sbatch --partition=boblab,common script.sh
.
⚠️ Although rarely needed, if you need to submit your jobs to a specific compute node, which you do via Slurm option --nodelist
, then you need to also specify --partition
for that specific node, otherwise sbatch
gives an error on ‘Batch job submission failed: Unspecified error’.
In order to see all available partitions on the cluster, use the sinfo command:
PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
0cdf up 14-00:00:0 3 idle c4-n[32-34]
blellochlab up 14-00:00:0 1 idle c4-n16
bastianlab up 14-00:00:0 1 idle c4-n25
cbc up 14-00:00:0 2 idle c4-n[12-13]
common* up 14-00:00:0 6 idle c4-n[1-5,10-11]
diazlab up 14-00:00:0 1 idle c4-n31
francislab up 14-00:00:0 1 mix c4-n17
kimlab up 14-00:00:0 1 idle c4-n22
koberlab up 14-00:00:0 1 idle c4-n18
kriegsteinlab up 14-00:00:0 1 idle c4-n27
krummellab up 14-00:00:0 1 idle c4-n20
molinarolab up 14-00:00:0 2 idle c4-n[28-29]
shannonlab up 14-00:00:0 2 idle c4-n[23-24]
sblab up 14-00:00:0 1 idle c4-n26
wittelab up 14-00:00:0 2 idle c4-n[6-9,14-15]
zivlab up 14-00:00:0 1 idle c4-n19
In the above example, the asterisk indicates that ‘common’ is the default partition. The ‘mix’ state means that some of the nodes in the partition that run jobs, ‘idle’ means those nodes are not running jobs. The ‘drain’ and ‘drng’ states indicate that the node has been taken offline by the sysadmin. Draining means the nodes is still running jobs but won’t accept new work.