Caviness Community Cluster

The Caviness cluster, UD’s third Community Cluster, was deployed in July 2018 and is a distributed-memory Linux cluster. It is based on a rolling-upgradeable model for expansion and replacement of hardware over time. The first generation consists of 126 compute nodes (4536 cores, 24.6 TB memory). The nodes are built of Intel “Broadwell” 18-core processors in a dual-socket configuration for 36-cores per node. An OmniPath network fabric supports high-speed communication and the Lustre filesystem (approx 200 TiB of usable space). Gigabit and 10-Gigabit Ethernet networks provide access to additional filesystems and the campus network. The cluster was purchased with a proposed 5 year life to the first generation hardware, putting its refresh in the April 2023 to June 2023 time period.

This cluster was designed to pack more computing power into less physical space, use power more efficiently, and leverage reusable infrastructure for a longer overall lifespan. This cluster will use Penguin’s Tundra Extreme Scale (ES) design which follows the specifications of the Open Compute Project.

Infrastructure Provided by IT:

All of these come with your purchase of nodes.

Basic Needs
  • Installation in a secure data center
  • Racks, floor space, cooling and power
  • Five-year warranty on nodes
Cluster Networks
  • 2 x 10 Gbps uplink to campus network
  • 12 x 100 Gbps Intel OmniPath uplinks between racks
Storage

Aggregate storage across the cluster:

  • 320 TB (raw) Lustre high-speed scratch
  • 160 TB (raw) NFS for home and workgroup directories

Workgroups have unlimited access to Lustre, plus:

  • Unlimited UD- and guest-user accounts with 20 GB home directory
  • Workgroup directory quotas start at 1 TB and scale in proportion to investment
Login nodes

Users connect to the cluster through two login nodes:

  • 2 x 18C Intel E5-2695 v4 (36 cores)
  • 128 GB DDR4 memory (8 x 16 GB)
  • 1 x 10 Gbps uplink to campus network, Internet
  • 1 x 100 Gbps Intel OmniPath cluster network

Purchased by Stakeholders:

First generation:

  Description Total Cost
Standard architecture
  • 2 x 18C Intel E5-2695 v4 (36 cores)
  • 128 GB DDR4 memory (8 x 16 GB)
  • 960 GB SSD (local scratch, swap)
  • 1 x 100 Gbps Intel OmniPath cluster network
60 $5000
Memory upgrade
  •  256 GB DDR4 memory (8 x 32 GB)
48  +$1000
  •  512 GB DDR4 memory (16 x 32 GB)
6  +$3500
 Coprocessors
  •  2 x nVidia P100 “Pascal” GPU
10  +$7000
Other options

Subject to discussion with IT:

  • Additional workgroup storage quota
  • Accelerated local scratch storage (per-node NVMe)

Naming Caviness


 

The Caviness cluster is named in honor of Jane Caviness, former director of Academic Computing Services at the University of Delaware. In the 1980s, Caviness led a ground-breaking expansion of UD’s computing resources and network infrastructure that laid the foundation for UD’s current research computing capabilities. After leaving UD, Caviness went to the National Science Foundation (NSF) as program director for NSFNET in the newly formed Division of Networking and Communications Research and Infrastructure, later serving as deputy division director. She oversaw the implementation of the NSFNET’s initial backbone and the expansion of network connectivity between colleges, universities, NSF supercomputer centers, and other research centers. Caviness’ activity in the Association for Computing Machinery (ACM) and EDUCOM, including a term as vice-president for Networking at EDUCOM, highlight how strong an advocate she has been for cooperation and collaboration in the research computing community.

The naming of UD’s third HPC community cluster continues the practice of naming our HPC clusters in honor of UD faculty and staff who have played a key role in the development of the internet: David Mills, David Farber, and now Jane Caviness.