Examples not working?

‘Hello World’ Job #

The C4 cluster farm consists of a number of compute nodes that are ready to serve users’ compute tasks (aka jobs). Since all compute nodes are configured the same way, for instance, they have the exact same set software installed, it does not matter on which compute node your analysis runs.

At any time, there will be many users using the cluster where some users run a single analysis whereas other run many multi-day jobs in parallel. In order for users not to step on each others toes and for users to get a fair share of the compute resources, the cluster uses a so called job scheduler to orchestrate the compute requests. This works by users submitting their compute jobs to the scheduler. Then the scheduler will locate one or more compute nodes with enough free resources to process the submitted job and launch the job on those compute nodes.

Instructions #

The most common way of running compute tasks on the C4 cluster, consists of:

creating a script,
submitting the script to the scheduler,
waiting for the script to start and finish, and
looking at the results, e.g. output data files and text logs.

The C4 cluster uses Slurm as its scheduler. Slurm provides command sbatch to submit your job scripts and command squeue to check the status of your jobs. Slurm also provides a way to run a job interactively called srun.

Further information with detailed examples on job submissions can be found on separate pages under the ‘Scheduler’ menu.

Example #

In this example we will run compute jobs that outputs the name of the compute node that runs the job, waits ten seconds to emulate some processing, and the time it runs. The name of the machine where the script runs is available in environment variable HOSTNAME (standard in Unix) and the current time can be queried by calling command date. Here is a shell script ~/tests/hello_world that writes a message, waits for ten seconds, and displays the date:

#! /bin/env bash

echo "Hello world, I am running on node $HOSTNAME"
sleep 10
date

Hint: To create this file, make sure that the folder exists first. If doesn’t, call mkdir ~/tests.

Although not critical for the job scheduler, it is always convenient to set the file permission on this script file to be executable, e.g.

[alice@c4-dev2 ~]$ cd tests/
[alice@c4-dev2 tests]$ chmod ugo+x hello_world

This, in combination with the so called “she-bang” (#! ...) on the first line, allows you call the script just as any other software, e.g.

[alice@c4-dev2 tests]$ ./hello_world
Hello world, I am running on node c4-dev2.
Thu Dec 31 10:24:41 2020

Note how it takes ten seconds between the Hello world message and the time stamp. We have now confirmed that the shell script does what we expect it to do, and we are ready to submit it to the job queue of the scheduler. To do this, do:

[alice@c4-dev2 tests]$ sbatch hello_world
Submitted batch job 3084

When submitting a job, the scheduler assigned the job a unique identifier (“job id”). In the above example, the job id is ‘3084’. We can see this and other jobs of ours on the job queue by using squeue;

[alice@c4-dev2 tests]$ squeue --long -u $USER
Thu Dec 31 10:34:04 2020
             JOBID PARTITION     NAME     USER    STATE       TIME TIME_LIMI  NODES NODELIST(REASON) 
              3084    common hello_wo    alice  PENDING       0:00 14-00:00:00    1 (Priority)

We can see that the job is “pending”, which means that the scheduler is still looking for a compute node where this job can be sent. When the job launches, the status will be reported as “running”. When the job finishes, squeue will no longer list it.

So where is the output of the job? By default, all output is redirected to a file in the current working directory with a name reflecting the job id;

[alice@c4-dev2 tests]$ cat slurm-3084.out
Hello world, I am running on node c4-n10
Thu Dec 31 10:34:00 PST 2020
[alice@c4-dev2 tests]$ 

There is of course nothing preventing us from submitting the same script multiple times. If done, each submission will result in the script be launched on a compute node and a unique log file slumrm-<job_id>.out will be outputted. Please try that and see what squeue outputs.

Now, you may want to pass different arguments to your script each time, e.g. each job should process a different input data file. For information on how to do this, and other things, see the Submit Jobs page.