ASC集群设计
Cluster diagram
The architecture of the entire cluster is depicted in Figure 1.
Hardware Resources
Item | Name | Configuration | Num |
---|---|---|---|
Login in Node | Inspur NF5688M5 | CPU: Intel Xeon Gold 5318Y_24cores_2.1G *2 Memory: 64G * 16,DDR4,3200Mhz Hard disk: 1.92T SSD SATA * 2 Power consumption estimation: 5318Y TDP 165W, memory 70W, disk 50W | 1 |
CPU Compute Node | Inspur NF5688M6 | CPU: Intel Xeon Gold 6348 Processor 2.6GHz 28 cores * 2 Memory: 64G * 16,DDR4,3200Mhz Hard disk: 1.92T SSD SATA * 2 Power consumption estimation: 6348 TDP 235W, memory 42W, disk 300W | 2 |
GPU Compute Node | Inspur NF5688M6 | CPU: Intel Xeon Gold 6348 Processor 2.6GHz 28 cores * 2 Memory: 64G * 16,DDR4,3200Mhz Hard disk: 1.92T SSD SATA * 2 Power consumption estimation: 6348 TDP 235W, memory 42W, disk 300W NVIDIA A100 SXM4 NVLink 600 GB/s MEMORY:80GB HBM2 BANDWIDTH:1,555 GB/s Max TDP Power:400W | 1 |
HCA card | InfiniBand/VPI Card | ConnectX-5 VPI adapter card, FDR/EDR IB (100Gb/s) and 40/50/100GbE dual-port QSFP28 PCIe4.0 x16, tall bracket Power consumption estimation: 18W | 3 |
Switch | GbE switch | 10/100/1000Mb/s, 24 ports Ethernet switch Power consumption estimation: 100W | 1 |
EDR InfiniBand Switch | SB7800 InfiniBand EDR 100Gb/s Switch System 36 QSFP28 non-blocking ports 300W typical power consumption | 1 | |
Cable | Gigabit CAT6 cables | CAT6 copper cable, blue, 3m | 3 |
InfiniBand cable | Mellanox MCP1600 -E0xxEyy direct attach copper (DAC) cables, 100Gb/s QSFP28 port IB EDR, 3m, Black, 26AWG | 3 |
Software Resources
Item | Name | Version |
---|---|---|
Operating system | CentOS | 7.9 |
Translater | mpicx, mpicpx | 2024.0.2 |
icx | 2024.0.2 | |
Math library | Intel MKL | 2024.0.2 |
MPI | OpenMPI | 4.0.5 |
Intel-mpi | 2024.0.2 | |
GPU-accelerated application | CUDA toolkit | 12.4 |
System analysis
The estimated computing performance of the cluster is as follows (FP32):
- CPU: 2.6 GHz (6348 Processor Base Frequency) * 28 (28 cores) * 4 (4 Processor) * 1 * 2 * 512 (# of AVX-512 FMA Units, 1, with up to 2 FMAs) / 32bit * 1 (1 compute node) = 13977.6 GFLOPS = 13.9776 TFLOPs
- GPU: 19.5 TF (Peak FP32) * 2 = 39 TFLOPs
- Total: 52.9776 TFLOPs
The power consumption estimation for resource utilization is presented in Table 3.
Name | Power consumption |
---|---|
Login in Node | 450W * 1 |
CPU Compute Node | 750W * 2 |
GPU Compute Node | 1600W * 1 |
GbE Switch | 100w |
InfiniBand Switch | 250w |
InfiniBand/VPI Card | 18w * 3 |
Total | 3954w |
The HPC cluster design offers impressive computational power (52.9776 TFLOPs) and optimized communication through InfiniBand and Ethernet, making it ideal for AI and research tasks. However, its high power consumption and limited storage, combined with a single login node and only one GPU node, could hinder scalability and reliability.