is a hybrid x86 and GPU cluster and consists of a total of ~1000 cores and roughly 3.5 TB RAM located in different node types
OS, with Saltstack
The nodes are interconnected via InfiniBand
for parallel computation jobs ( MPI
) and equipped with ample 250TB of cluster-wide hard disk storage
On both Clusters Calculations are scheduled and automatically distributed via the Slurm Workload Manager
For the impatient, you will find a quick run through here
Getting an account
You need an account at Freie Universität Berlin
which has been enabled at the Department of Mathematics and Computer Science
The cluster is only reachable from within the Department of Mathematics and Computer Science.
If you want to get access to the cluster from outside the department,
please login on one of our ssh remote login nodes and then jump to allegro or use a ssh tunnel.
To login to a cluster, you need an SSH client of some sort. If you are using a linux or unix based system, there is most likely one already available to you in a shell, and you can get to your account very quickly. For Microsoft Windows, we recommend
$ ssh <username>@allegro.imp.fu-berlin.de
Login via SSH-Tunnel
$ ssh -f -L 9999:allegro.imp.fu-berlin.de:22 -N <username>@andorra.imp.fu-berlin.de
$ ssh <username>@localhost -p 9999
The cluster is equipped with a small /home partition for scripts and profile data and about 250TB of cluster-wide hard disk storage
for computational data.
Cluster nodes can not
access the nfs filesystems mounted on typical workstations (e.g. /home, /storage, /group) in the department of Mathematics and Computer Science.
Please find more details concerning storage here
Save the following as
and replace the
There is no routing queue so you have to specify which partition you want to use.
#SBATCH -J testjob
#SBATCH -D /data/scratch/<USER>
#SBATCH -o testjob.%j.out
#SBATCH --mail-user=<EMAIL ADDRESS>
A list of usable partitions and their restricitions is available via
scontrol show partitions
$ scontrol show partitions
You can get a list of available partitions via
$ sinfo -Nel
Then, it's time to start the job via
$ sbatch job_script.sh
You can see your currently running jobs with
- Selecting Nodes Classes
- selecting node classes, required for consistent running time results.
- Selecting Queues
- selecting queues for short / long running jobs.
- Job arrays
- allow you to submit a sequence of similar job scripts that only differ by one environment variable (
squeue -u command's output is not in real time.
scontrol show job to get detailed info about a job.
-t flag for array execution. Use
PBS_ARRAYID=1 bash job_script.sh to simulate one of array execution locally.
- If you specify
-l nodes=1 then you will NOT get the node exclusively. Use
- If you call a script in your job script then this is not cached. Do not modify included scripts or be aware of the side effects!
- Since September 12 you can run X11-Programs (ssh -X allegro...) and see the WIndow on your Workstation.
File System Paths
The following file system locations are interesting:
- Extra User's home on allegro, fast infiniband-connected hard drives.
- Things that are normally available through the network. For example:
- Scratch directory for temporary data. It is a good idea to set your
TMPDIR environment variable to
/data/scratch/$USER/ after creating this directory.
- Local scratch directory for temporary data in case you need the speed of a local disk -- beware the limited space.
Data can be copied from the
paths to the home directory on allegro.
Cluster Queue Commands
Cluster Queue Management
The cluster queue is managed by the Slurm Workload Manager.
The system knows the following Quick Start
Cluster Resource Policy
- The time limit that you give to your jobs is 'hard'. Jobs will be killed if they would still run but your runtime is up.
- The memory limit that you give to your jobs is 'hard', too. Same as above with time. If you allow the job to have 1024Mb but it uses 1034, it will be killed immediately. The message that you will receive for such an event will read like this: 'job violates resource utilization policies'.
- The core limit you give to your job is 'hard', too. This means that if you set the number of cores/threads to 2 but start 4 threads, you will have 4 threads running each getting 50% of CPU time
transition from torque
Before may 2015, allegro was running another scheduler ( torque