The following text can be inserted in any grant or other documentation in which a description of the PMACS HPC services is required. This text also provides a summary of the hardware and environment of the system. Please note: Unless specifically requested by the user, no HPC disk systems are backed up.
The PMACS/DART HPC facility opened in April 2013 to meet increasing growth in processing and storage needs for our epidemiology, biostatistics, genomics, and bioinformatics groups. The cluster is managed by full-time system administrators, and is located at Tierpoint Philadelphia, a Tier-3, SSAE 16/SAS 70 Type II Audit compliant co-location datacenter facility. It is authorized to process and store data in compliance with HIPAA regulations.
The cluster has 76 general purpose compute nodes, one 'big memory' node, and two GPU nodes. The 76 compute nodes are Dell C6420 units, each with two 20-core Intel Xeon Gold 6148 2.40GHz CPUs, between 256-512 GB RAM each, a single 56 GB/s InfiniBand connection to the GPFS file system and a 1.6TB dedicated scratch space provided by a local NVMe drive. The "large memory" Dell R940 machine is configured with four 12-core Intel Xeon Gold 6126 2.60GHz CPUs, 1.5TB RAM, a single 100Gb/s InfiniBand connection to the GPFS file system and 1.6TB dedicated NVMe scratch space. The two GPU nodes are each configured with two22-core Intel Xeon E5-2699 v4 2.20GHz CPUs, 512GB RAM, a single NVidia Tesla P100 GPU card (3584 CUDA cores & 16GB RAM per card), a single 56Gbps Infiniband connection to the GPFS file system and a 10 Gbps link to the rest of the cluster.
All the nodes in the compute cluster are sub-divided into virtual processing cores, with the capability to provision up to 6,352 virtual cores at 3-6GB of RAM per virtual core. Cluster and storage interconnection is provided by a 40Gbps core Ethernet fabric with a minimum of 10Gbps connectivity to each compute node and in addition a single 56Gbps Infiniband link on all compute nodes.
Cluster nodes are attached to 4.2 Petabytes of shared, tightly coupled, highly performant parallel file system disk storage provided by IBM Spectrum Scale (a.k.a GPFS) technology (no backup). The disk sub-system is presented to the compute nodes via an eight-node IBM GPFS manager cluster. Computational job scheduling/queuing and cluster management is orchestrated by the IBM Platform Computing (LSF) suite of products. Long-term active archiving of data is available via a SpectraLogic T950 tape library, with a total current capacity of 1.2 Petabytes of storage and sufficient room for growth. Each archive tape is mirrored, providing redundancy in the event of a tape failure. Tape library and archive management are provided by a Quantum product which allows the tape archive to be presented as a simple file share while providing robust data protection and verification processes in the background.
The PennHPC baseline cost structure is fee-for-service based around the service-center model. Costs are allocated in a model that strives for cost recapture and budget neutrality, in that all operating costs are covered by usage fees, with no retained monies year-over-year. Costs as of 4/30/2014 are:
- $0.035/computational/vCore slot hour
- $0.055/GB/month for disk usage
- $0.015/GB/month of active archive mirrored tape storage
- $95/hour for consulting services (excludes account setup)
- No charges to maintain an account; charges are billed on an as-consumed basis only.
The system is managed by Penn Medicine Academic Computing Services, the central IS/IT department for the Perelman School of Medicine.
Current staff members are:
- Kash Patel, Associate VP Chief Digital Technology Officer, Corporate Information Services, Penn Medicine
- Rikki Godshall, Manager, HPC and Cloud Services, ERA, PMACS
- Anand Srinivasan, Sr. Project Leader, HPC and Cloud Services, ERA, PMACS