cheaha-cluster.md 4.94 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
# UAB Compute Cluster a.k.a. Cheaha

[TOC]

## Overview

Cheaha is a large, multi-unit computational system for running massively
parallel compute tasks. It is managed by the [UAB Research Computing Group](https://www.uab.edu/it/home/research-computing)

Cheaha is currently the fastest supercomputer in the state of Alabama with a theoretical throughput of
approximately 450 TFlop/s (HUGE COMPUTE!) and consists of over 3000 CPU cores and 72 NVIDIA-P100 GPU's. Cheaha is
supported by a high-speed parallel filesystem (GPFS) that can store 6 PB non-redundantly and 4 PB redundantly (with
more to come!) interconnected by a high speed infiniband network. UAB researchers use Cheaha for wide variety of
research such as genomics, neuro-imaging, machine learning, statistical genetics, cancer detection etc.

Use of this resource is governed by the
[UAB Acceptable Use Policy for Computer and Network Resources](https://www.uab.edu/policies/content/Pages/UAB-IT-POL-0000004.aspx)

For more information on Cheaha and the tools available to support research please review the documentation:
<http://docs.uabgrid.uab.edu/wiki/Cheaha>

## Access

To get setup with cluster access you'll need your BlazerID and send an email to the cluster
support group (`support@listserv.uab.edu`).

You can use this template email filled in with your information to make this request.

```text
Hello!

My name is __YOUR_NAME__ and I’m a __TITLE__ in Dr. Liz Worthey’s lab.
I’d like to request access to the cluster for our Genomics, Genetics and Data
Science research. In particular I will be doing data analysis, pipeline
development, and genomics research using the compute resources of the cluster.

Sincerely,

YOUR NAME
TITLE
Dr. Liz Worthey’s Lab
Center For Computational Genomics and Data Science
```

## Storage spaces

* Scratch Space
  * 1 TB of fast storage (i.e. close to the compute for super fast input/output)
* Home Space
  * 50 GB of fast-ish storage for small data, scripts, small analyses, etc.
* User Data Directory
  * 20 TB of fast-ish storage for larger data needs
* Lab/Project Space
  * 50 - 100 TB per lab of fast-ish storage for project level data and analysis
* Commodity Storage (coming soon!)
  * ??? TB of slower storage but HUGE for bigger datasets

## Submitting Jobs

You can SSH into the cluster via

```bash
ssh BLAZERID@cheaha.rc.uab.edu
```

The cluster uses the Slurm queue management system (stands for Simple Linux Utility for Resource Management) for
scheduling, distributing, and managing compute "jobs". A "job" is just a general term used to describe doing a specific
task, or set of tasks (specified in script) on the compute contained within the cluster.

For a complete description and tutorial of writing and executing jobs on the cluster see Research Computing's helpful
[guide](https://docs.uabgrid.uab.edu/wiki/Slurm) on Slurm and executing compute tasks on the cluster. You can also check
out the below tutorial for a quick high level view of the cluster.  

## Python on the Cluster  

Anaconda is a free and open-source distribution of the Python and R programming languages for scientific computing, that
aims to simplify package management and deployment. Package versions are managed by the package management system conda.
CDGS plans on using conda on the cluster for multiple projects involving the use of python.

### Conda Shortcuts for cluster

* Enabling Conda Module on Cluster  
  `module avail Anaconda`

* Creating new Conda Environment  
  `conda create --name test_env`  
  Packages can be included within the new environment with a similar command  
  `conda create --name test_env PACKAGE_NAME`
* List available virtual environments available  
  `conda env list`  
  Virtual environment with the asterisk(\*) next to it is the one that's currently active
* Activating conda virtual environment  
  `source activate test_env`
* Deactivating Virtual Environment  
  `source deactivate`
* Export Conda virtual environmnet to share  
  `conda env export -n test_env > environment.yml`
* Creating Conda Virtual Environment from environment.yml  
  `conda env create -f environment.yml -n test_env`
* Deleting a Conda Virtual Environment  
  `conda remove --name test_env --all`

For a complete tutorial and for a most up-to-date version, please use the tutorial from
[UAB Research Compute's Anaconda Wiki](https://docs.uabgrid.uab.edu/wiki/Anaconda).

## Briefings and Highlights

![slide01](img/cheaha_101/Slide01.png)
![slide02](img/cheaha_101/Slide02.png)
![slide03](img/cheaha_101/Slide03.png)
![slide04](img/cheaha_101/Slide04.png)
![slide05](img/cheaha_101/Slide05.png)
![slide06](img/cheaha_101/Slide06.png)
![slide07](img/cheaha_101/Slide07.png)
![slide08](img/cheaha_101/Slide08.png)
![slide09](img/cheaha_101/Slide09.png)
![slide10](img/cheaha_101/Slide10.png)
![slide11](img/cheaha_101/Slide11.png)
![slide12](img/cheaha_101/Slide12.png)
![slide13](img/cheaha_101/Slide13.png)
![slide15](img/cheaha_101/Slide15.png)
![slide16](img/cheaha_101/Slide16.png)
![slide17](img/cheaha_101/Slide17.png)