Open
Milestone
started on May 21, 2024
GPFS5 migration
https://uab-rc.slack.com/archives/C9YB78YSX/p1715788192490339
High level summary by William:
Regarding our A/B user migration, my understanding is there are two primary desired invariants:
- Storage invariant
- Group A has no data in GPFS4 while account in normal state
- Group B has no data in GPFS5 while account in normal state
- Compute invariant
- Group A has no jobs on original partitions while account in normal state
- Group B has no jobs on "gpfs5_" partitions while account in normal state To satisfy the storage invariant, all GPFS4 access points must be blocked for Group A. Same for GPFS5 for Group B.
- Access to GPFS4
- ssh, sftp to login004 (GPFS4 mounts) after ssh to
cheaha.rc.uab.edu
- block these in a fashion similar to how we block login005?
- OOD file transfer interface
- Globus file transfer
- ssh, sftp to compute node (GPFS4 mounts)
- Handled by the compute invariant. Can only ssh to nodes with running jobs. If we use Slurm to block submission to nodes with GPFS4 mounts (via partitions), we're set.
To satisfy the compute invariant, job submission with the Slurm scheduler must be considered. There are three commands to submit jobs in our version of Slurm. Note that a fourth,
scrun
, was introduced in 2022 for OCI-compatible containers. We don't need to consider it.
- Handled by the compute invariant. Can only ssh to nodes with running jobs. If we use Slurm to block submission to nodes with GPFS4 mounts (via partitions), we're set.
To satisfy the compute invariant, job submission with the Slurm scheduler must be considered. There are three commands to submit jobs in our version of Slurm. Note that a fourth,
- ssh, sftp to login004 (GPFS4 mounts) after ssh to
-
sbatch
: the current proposal is to change scripts automatically based on A/B grouping, so that group A has partition namesname transformed into gpfs5_
name -
srun
: not clear this can be automated, but partition access needs to be blocked. Maybe we can use Linux groups like we do with sciencedmz. -
salloc
: same assrun
Loading
Loading
Loading
Loading