Open
Milestone started on May 21, 2024

GPFS5 migration

https://uab-rc.slack.com/archives/C9YB78YSX/p1715788192490339

High level summary by William:

Regarding our A/B user migration, my understanding is there are two primary desired invariants:

  • Storage invariant
    • Group A has no data in GPFS4 while account in normal state
    • Group B has no data in GPFS5 while account in normal state
  • Compute invariant
    • Group A has no jobs on original partitions while account in normal state
    • Group B has no jobs on "gpfs5_" partitions while account in normal state To satisfy the storage invariant, all GPFS4 access points must be blocked for Group A. Same for GPFS5 for Group B.
  • Access to GPFS4
    • ssh, sftp to login004 (GPFS4 mounts) after ssh to cheaha.rc.uab.edu
      • block these in a fashion similar to how we block login005?
    • OOD file transfer interface
    • Globus file transfer
    • ssh, sftp to compute node (GPFS4 mounts)
      • Handled by the compute invariant. Can only ssh to nodes with running jobs. If we use Slurm to block submission to nodes with GPFS4 mounts (via partitions), we're set. To satisfy the compute invariant, job submission with the Slurm scheduler must be considered. There are three commands to submit jobs in our version of Slurm. Note that a fourth, scrun, was introduced in 2022 for OCI-compatible containers. We don't need to consider it.
  • sbatch: the current proposal is to change scripts automatically based on A/B grouping, so that group A has partition names name transformed into gpfs5_name
  • srun: not clear this can be automated, but partition access needs to be blocked. Maybe we can use Linux groups like we do with sciencedmz.
  • salloc: same as srun
  • Work items 8
  • Merge requests 1
  • Participants 3
  • Labels 5
Loading
Loading
Loading
Loading
79% complete
79%
Start date
May 21, 2024
From
May 21 2024
Due date
No due date (768 days elapsed)
8
Work items 8
Open: 1 Closed: 7
1
Merge requests 1
Open: 0 Closed: 0 Merged: 1
0
Releases
None
Reference: rc%"GPFS5 migration"