HPC Policies

YCRC HPC Policies

Research Access and Accounts

The YCRC operates high performance computing (HPC) resources supporting research in a wide range of disciplines across the university. Upon request, the YCRC will provide a principal investigator (PI) group account on the YCRC’s HPC resources to any member of the ladder faculty appointed at the level of Assistant Professor or above, or any member of the research faculty appointed at the level of Research Scientist/Scholar or above who is responsible for their own independent research program. Other university faculty or staff may apply for such an account, subject to additional administrative approval. Once a group account has been created, and with the approval of the PI, members of a PI’s research group may request individual user accounts under the group account. The PI is ultimately responsible for all use of YCRC resources by members of the PI’s group, including any costs for data storage and computing that may be associated with such use.

Account Deactivation and Removal

The YCRC audits cluster accounts annually on November 1, and all accounts not associated with a valid NetID will expire at that time. In addition, YCRC needs to be able to contact all account holders for security and communication purposes. Therefore, logins will be disabled throughout the year for accounts that are found not to be associated with a valid email address.

PIs are responsible for notifying the YCRC when a user has left the PI’s research group or the University, at which time the user’s account will expire unless other arrangements are made with the YCRC. When accounts expire, PIs are responsible for ensuring all files are properly managed or removed. However, the YCRC reserves the right to delete expired accounts and all files associated with them, if necessary. Users should arrange to transfer file ownership when possible before leaving the group(s) with which they have been working.

Undergraduate Student Access

The YCRC focuses primarily on computational research conducted by faculty, research staff, postdocs, and graduate students. However, the YCRC may provide limited access to undergraduate students, most often when they are part of a PI’s research group or enrolled in a course that uses YCRC resources. In appropriate circumstances, undergraduate students may request use of YCRC resources for independent research under the oversight of a faculty sponsor/advisor. Such requests will be considered on a case-by-case basis, and approval will be at the discretion of the YCRC.

Academic Courses on YCRC Clusters

Instructors of Yale academic courses may request the use of YCRC facilities for their students. Instructors planning the use of YCRC facilities in their courses are expected to consult with the YCRC and obtain prior approval at least 60 days before the start of the semester. Instructors and their teaching staff will be expected to provide primary support for their students using YCRC facilities, with secondary assistance from YCRC staff as appropriate. The YCRC will do its best to ensure access to its facilities throughout the semester, but scheduled hardware maintenance periods will take place according to the YCRC’s published schedule as listed on the System Status Page on the YCRC website, and instructors should plan accordingly. Instructors should also understand that, in rare instances, clusters may be unavailable due to unforeseen circumstances. All accounts and data created for course use will be disabled and removed 30 days after the end of the semester.

Policies regarding account creation and access to HPC resources are subject to change.

The YCRC’s engineering group administers all HPC systems, including machines (HPC clusters), storage facilities, and related data center networking. Since these systems are intended to support research applications and environments, they are designed and operated to achieve maximum performance levels and job throughput. The availability of the resources, while important, is secondary to their level of performance. Therefore, the engineering group’s primary responsibility is to maximize system stability and uptime while maintaining high performance. To that end, all systems are subject to regular maintenance periods as listed on the System Status Page on the YCRC website. During these maintenance periods and, rarely, at other times, some or all of the YCRC HPC resources may not be available for use.

The YCRC will provide each PI group access to compute and storage resources on one YCRC cluster, subject to reasonable limits and YCRC discretion. In some circumstances, the YCRC may provide a PI group or some of its members with access to additional resources as appropriate for their computational work. (Such additional access may be temporary.) Users are expected to do their best to use the clusters efficiently and release idle resources.

All users will have access to limited amounts of storage in home, project, and short-term scratch directories, free of charge. Quotas limit the amount of storage and the number of files per user and/or PI group. Users are expected to do their best to delete files they no longer need. Files in short-term scratch directories are purged automatically after 60 days. Using scratch for long-term file storage (through artificial extension of expiration or other means) is forbidden without explicit approval from the YCRC. PIs are responsible for all storage usage by members of their groups. Additional storage may be provided upon request and subject to YCRC discretion, which may or may not incur a cost.

Additional storage may be provided upon request, which may or may not incur a cost. At the end of the storage allocation’s term, the storage will be reclaimed as described

30 and 7 days before the expiration date: The PI will receive email reminders of storage expiration.

On expiration date: The PI will receive another email reminder. In addition, the storage quota will be set to 0 TB or to the default storage quota (if the additional allocation was an extension of pre-existing project space). Users will still be able to read and delete data in the allocated storage location.

30 and 50 days past expiration date: The PI will receive an email reminder that the allocation has expired, that the allocated storage location will be deleted, and that any remaining data will be purged as of 60 days past the expiration date.

60 days past expiration date: The allocated storage location will be deleted, and any remaining data will be purged. The PI will receive email notification of these actions.

All computation on the “standard” tier of partitions (e.g. day, week, mpi, gpu) as well as private nodes and scavenge partitions do not incur any charges. Researchers can escalate specific important computations to Priority Tier and run less urgent workloads in the Standard Tier.

The Service Unit rate structure for the Priority Tier is derived to closely match the prorated cost of a similar dedicated node over a 5 year expected lifetime. As such, Priority Tier is an alternative to purchasing dedicated nodes. For the same cost, researchers can realize 100% of the value of their funds (compared to inevitable idle time on dedicated nodes) and always be computing on the YCRC’s latest resources (vs dedicated nodes which get more out of date every year and are decommissioned after 5 years).

Access detailed information on Priority Tier, including how to request access and rates.

Learn how to use the Priority Tier partitions.

Data Use Agreements

Regardless of the sensitivity/risk classification of the data, users are prohibited to use or store, on any YCRC facility, data covered by a data use agreement (DUA) unless the DUA has been approved by the Office of Sponsored Projects and reviewed by YCRC. A copy of the DUA must be provided to the YCRC and, as early as possible in the process and before storing any covered data on any YCRC facility, the YCRC must be informed of and agree to meet all applicable computing-related requirements of the DUA, including, but not limited to, requirements for data encryption, access control, auditing, and special actions to be taken upon removal of the data. Users who expect to start a project that involves the use of YCRC facilities for sensitive data or data subject to a DUA, or who are applying for external funding for such a project, should consult with YCRC early in the planning process. PIs of such projects are responsible for informing YCRC which users are authorized to access which covered data and for notifying YCRC when there are changes to the list of authorized users.

YCRC Security Procedures

Security of YCRC facilities and all data stored on them is critical, and the YCRC takes several steps to help provide a secure computing environment, including:

Operating the clusters in a secure data center, with restricted and logged access;
Using firewalls to allow access only from the Yale campus network or the Yale VPN, and restricting that access to only login and data transfer servers;
Requiring ssh key pairs (not passwords) for ssh authentication;
Requiring Multi-factor Authentication for both VPN and HPC authentication;
Keeping operating systems up to date, and regularly applying security patches;

In addition, users are responsible for the security of YCRC clusters and their data. Accordingly, users are expected to follow standard security practices to ensure the safety and security of their accounts and data. This includes:

Using strong passphrases on ssh keys;
Setting permissions appropriately on data files and directories;
Never sharing private keys, passphrases, or other login information;
Following the terms of any regulations or Data Use Agreements that cover their data.

Elevated privileges, such as sudo or root access, on all systems operated by the YCRC are strictly limited to YCRC staff to ensure the security and stability of the systems.

Further information regarding Yale cybersecurity and data classifications can be found at Yale Cybersecurity website.

Except as described on the YCRC website for specific clusters, only files in users’ home directories are backed up, and then only for a short time (approximately 30 days on most clusters). No other user files are backed up at all. Backups are stored locally, so significant events affecting the HPC data center could destroy both the primary and backup copies of user files. Users should maintain their own copies of critical files at other locations. YCRC cannot guarantee the safety of files stored on HPC resources.

The YCRC’s HPC resources are shared by many users. The YCRC uses a workload management system (Slurm) to implement and enforce policies that aim to provide each PI group with fair, but limited, access to the HPC clusters. Users may not run computationally intensive workloads or compilations on the login or transfer nodes. Instead, users must submit such workloads as jobs to Slurm, specifying the amount of resources to be allocated for the jobs. Jobs running for longer than one week are discouraged. Slurm will terminate jobs exceeding their requested resource amounts with little or no warning. To avoid data loss if jobs terminate unexpectedly, users are strongly encouraged to checkpoint running jobs at regular intervals.

Users are expected to abide by the stated purposes and limits of the cluster partitions and submit jobs in alignment with YCRC best practices, such as not running large numbers of very short jobs or workflows that create an excessive number of small files. Jobs found making inappropriate use of a cluster may be canceled without prior notice and repeated offenses after a warning has been issued by YCRC staff can result in account suspension. In extreme cases, where a particular workflow threatens the system’s stability, the YCRC may temporarily lock an account without prior notice, with account restoration requiring consultation with YCRC staff to address the workflow.

The YCRC includes several research support staff members who can help users with a variety of tasks, including education and training, software installation, and workflow assistance. The YCRC offers a number of classes and workshops on a variety of topics relevant to research computing. New users are encouraged to attend one of the introductory training workshops (or watch a recorded training) to learn about the HPC clusters and become familiar with the YCRC’s standard operating procedures.

The YCRC procures, installs, and maintains many standard software tools and applications intended for YCRC facilities, including its HPC clusters. Among these are compilers and languages (e.g., Python, C, C++, Fortran), parallel computing tools (e.g., MPI), application systems (e.g., R, Matlab, Mathematica), and libraries (e.g., Intel Math Kernel Library, NAG, GNU Scientific Library, FFTW). Users requiring additional software on YCRC facilities are encouraged to install their own copies, though the YCRC’s research support staff can assist as needed. For customizable systems such as Python and R, the YCRC has set up procedures to enable users to easily install their own modules, libraries, or packages.

The YCRC has set up its own “ticketing system” to help manage and address inquiries, requests, and troubleshooting issues related to the HPC clusters. Users may contact YCRC staff by emailing research.computing@yale.edu. While the time required to resolve particular issues may vary widely, users may expect an initial communication from a YCRC staff member within a reasonable time (often within one business day). Users may also obtain assistance by arranging appointments to meet with the research support staff or attending the YCRC Office Hours.

At least once per year, the YCRC will offer PIs an opportunity to purchase dedicated HPC compute and storage resources for their groups. Often, such opportunities may be coordinated with the YCRC’s regular refresh and upgrade cycles for its compute and storage hardware infrastructure, but, subject to YCRC approval, PIs may request hardware purchases at other times, as needed.. At its discretion, the YCRC may restrict the types and quantities of dedicated HPC resources purchased for PI groups and require that purchased resources be compatible with the YCRC’s data center and network infrastructure and with applicable policies.

Procurement Process & Approved Vendors

All HPC resources will be procured through the YCRC from vendors of the YCRC’s choosing and purchased with warranties acceptable to the YCRC (currently for 5 years). HPC resources will have lifetimes consistent with their warranties, commencing upon delivery. After which, the resources may be decommissioned, or the lifetimes may be extended, at the YCRC’s sole discretion.

Hardware Decommission and Expiration

All hardware purchased for the YCRC clusters includes a multi-year warranty and is subject to decommission any time after that date. For storage, and if the owner desires to retain the data, new storage must be purchased in advance of warranty expiration to ensure time to migrate retained data. Data hosted on storage that is not renewed will be purged and unrecoverable after the expiration date. Compute nodes will be supported by YCRC in conjunction with the vendor during the warranty and may be run beyond their warranty at the YCRC’s discretion but may experience unrecoverable failure and should not be expected to remain in service for any amount of time after the warranty expires. The YCRC will decommission all nodes, whether or not they have failed, within two years of the expiration of their warranty to ensure capacity for new nodes and to maintain a modern HPC environment.

Private Partition and Resource Sharing

Each PI group may access its dedicated compute resources using a private Slurm partition restricted to use by group members. The YCRC reserves the right to allow other users to use idle dedicated compute resources by submitting jobs to a Slurm scavenge partition. Any scavenge job will be subject to termination after a minimum run time (defined per cluster) should a job be submitted to the group’s private partition requiring the node occupied by the scavenge job.

The McCleary high-performance computing system has specific resources that are dedicated to YCGA users. This includes a slurm partition (‘ycga’) and a large parallel storage system (/gpfs/ycga). The following policy guidelines govern the use of these resources on McCleary for data storage and analysis.

Yale University Faculty User

All Yale PIs using YCGA for library preparation and/or sequencing will have an additional 5 TB storage area called ‘work’ for data storage. This is in addition to the 5 TB storage area called ‘project’ that all McCleary groups receive.
Currently, neither work or project storage is backed up. Users are responsible for protecting their own data.
All Fastq files are available on the /gpfs/ycga storage system for one year. After that, the files are available in an archive that allows self-service retrieval, as described below. Issues or questions about archived data can be addressed to ycga@yale.edu.
Users processing sequence data on McCleary should be careful to submit their jobs to the ‘ycga’ partition. Jobs submitted to other partitions may incur additional charges.
Members of Yale PI labs using YCGA for library preparation and/or sequencing may apply for accounts on McCleary with PI’s approval.
Each Yale PI lab will have a dedicated ‘work’ directory to store their data, and permission to lab members will be granted with the authorization of the respective PI. Furthermore, such approval will be terminated upon request from the PI or termination of Yale Net ID.
Lab members moving to a new university will get access to HPC resources for an additional six months only upon permission from Yale PI. If Yale NetID is no longer accessible, former Yale members who were YCGA users should request a Sponsored Identity NetID from their business office. Sponsored Identity NetIDs will be valid for six months. Such users will also need to request VPN access.
A PI moving to a new university to establish their lab will have access to their data for one year from the termination of their Yale position. During this time, the PI or one lab member from the new lab will be provided access to the HPC system. Request for Guest NetID should be made to their business office. Guest NetID will be valid for one year.
Any new Yale faculty member will be given access to McCleary once they start using YCGA services.

External Collaborators

Access to McCleary can be granted to collaborating labs, with the authorization of the respective Yale PI. A maximum of one account per collaborating lab will be granted. Furthermore, such approval will be terminated upon request from the PI. This requires obtaining a Sponsored Netid. The expectation is that the collaborator, with PI consent, will download data from the McCleary HPC system to their own internal system for data analysis.

Non-Yale Users

Users not affiliated with Yale University will not be provided access to the McCleary high-performance computing system.

YCGA Data Retention Policy

YCGA-produced sequence data is initially written to YCGA’s main storage system, which is located in the main HPC datacenter at Yale’s West Campus. Data stored there is protected against loss by software RAID. Raw basecall data (e.g. bcl files) is immediately transformed into DNA sequences (fastq files).

~45 days after sequencing, the raw files are deleted.
~60 days after sequencing, the fastq files are written to an archive. This archive exists in two geographically distinct copies for safety.
~365 days after sequencing, all data is deleted from main storage. Users continue to have access to the data via the archive. Data is retained on the archive indefinitely. See below for instructions for retrieving archived data.

All compression of sequence data is lossless. Gzip is used for data stored on the main storage, and quip is used for data stored on the archive. Disaster recovery is provided by the archive copy.

More information on accessing the sequence data.

Downloadable Documents

HPC Policies

Facilities Description for Grant Applications

All clusters and storage are operational.

YCRC HPC Policies

Downloadable Documents

Affiliations

Training

All clusters and storage are operational.

HPC Policies

YCRC HPC Policies

Access and Accounts

Systems Administration

Compute & Storage Resources

Storage Expiration

Billing

Data and Security Considerations on YCRC Resources

Backups

Resource Scheduling and Jobs

Research Support and User Assistance

Hardware Acquisition and Lifecycle

YCGA User Policies

Downloadable Documents