HPC Cluster Draco

High-performance computing for research and teaching at Thuringian universities.

HPC-Cluster Draco (rack row 1)

Image: Andre Sternbeck (Universität Jena)

Draco — named after one of the seven wonders of Jena — is a high-performance computing (HPC) cluster that is available to all researchers at Friedrich Schiller University Jena and Thuringian universities. It can also be used for courses.

The HPC cluster can be used to carry out computationally intensive simulations in scientific research or for massively parallel processing and analysis of large data sets. Parts of the system can also be used interactively. With its distributed computing capacities, Draco provides the computing resources to run complex calculations and high-throughput tasks such as data mining or data analysis by multiple users simultaneously. In addition, Draco's GPU partition enables tasks such as machine learning or deep learning to be performed.

Profile

Draco was put into operation in June 2021 and has been expanded every year since then. The system currently comprises the following components:

Compute nodes

Draco has a total of 131 compute nodes with

108 standard compute nodes with mostly 48 cores and 256 GB of RAM per node
5 HighMem compute nodes with extra large RAM of 2.3 to 4 TB per node
17 GPU computing nodes with additional GPUs (computing accelerators) of the type NVIDIA V100, A100 and H100.

Storage systems

Draco provides various storage systems for different application profiles:

High-performance parallel file system with a capacity of 520 TB.
High-performance all-flash VASTDATA file system with a capacity of 500 TB.
Five NFS storage systems for the long-term storage of large amounts of data by research groups

Network

All computing nodes and storage systems are connected via a low-latency Infiniband network with a bandwidth of 100 or 200 Gbps (HDR, HDR100). The network has a spine-leaf topology with a blocking factor of 2:1.

Miscellaneous

The AlmaLinux 8 operating system is installed on the cluster.
Slurm is used as the workload manager for distributing the computing tasks across the cluster.
Additional remote workstations for interactive applications with a graphical user interface (Matlab, Mathematica, Lumerical, ...).

Note: Some of the compute nodes and storage systems are only available to individual user groups. Corresponding extensions by other research groups or temporary reservations of resources are possible.