README.md 3.23 KB
Newer Older
1
Project to provision an [OpenHPC](https://openhpc.community/) + [Open OnDemand](https://openondemand.org/) cluster via Vagrant using the
John-Paul Robinson's avatar
John-Paul Robinson committed
2
3
4
CRI_XCBC (XSEDE basic cluster) Ansible provisioning framework.

The Vagrantfile takes inspiration from the [vagrantcluster](https://github.com/cluening/vagrantcluster)
5
6
7
project but is oriented toward deploying only a master node
and using standard OHPC tools to provision the cluster, and
therfore favors the CRI_XCBC approach to ansible scripts just
John-Paul Robinson's avatar
John-Paul Robinson committed
8
9
10
for the master.

The Vagrantfile is stripped to the core (rather that carry all
11
the cruft of a vagrant init).  It leverages work from a
John-Paul Robinson's avatar
John-Paul Robinson committed
12
13
[pilot project](https://gitlab.rc.uab.edu/ravi89/ohpc_vagrant)
(primaryly the development of an updated centos 7.5 image)
14
but prefers a clean repo slate.
15

John-Paul Robinson's avatar
John-Paul Robinson committed
16
## Project Setup
17

18
19
Clone this project recursively to get the correct version for the
CRI_XSEDE submodule to build the OpenHPC(ohpc) and Open OnDemand (ood) nodes
20
```
21
git clone --recursive https://gitlab.rc.uab.edu/jpr/ohpc_vagrant.git
22
23
```

24
25
26
27
28
## Cluster Setup

After setting up the project above create your single node OpenHPC
cluster with vagrant:
```
29
vagrant up ohpc
30
31
32
33
```

The ansible config will bring the master node to the point where its
ready to ingest compute nodes via wwnodescan and prompt to you
34
35
36
start a compute node.  You can create a compute node and start it with
the helper scripts:

John-Paul Robinson's avatar
John-Paul Robinson committed
37
Create node c0 (choose whatever name makes sense, c0 matches the config):
38
```
John-Paul Robinson's avatar
John-Paul Robinson committed
39
compute_create c0
40
41
```

42
When prompted start compute node c0:
43
```
John-Paul Robinson's avatar
John-Paul Robinson committed
44
compute_start c0
45
46
```

47
If you want to stop the compute node:
48
```
John-Paul Robinson's avatar
John-Paul Robinson committed
49
compute_stop c0
50
51
52
53
```

If you want to get rid of the compute node VM:
```
John-Paul Robinson's avatar
John-Paul Robinson committed
54
compute_destroy c0
55
56
57
58
59
60
```

Note, the compute scripts work directly with the VirtualBox hypervisor.  The
machine created is a basic, lightweight diskless compute node the boots
via iPXE from the OpenHPC master.   You may need to adjust the path to the
ipxe.iso in compute_create to match your local environment.
61
62
63

## Cluster Check

64
After the `vagrant up ohpc` completes you can can log into the cluster with `vagrant ssh ohpc`.
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80

To confirm the system is operational run `sinfo` and you should see the following text:
```
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
low*         up 2-00:00:00      1   idle c0
```

You can run a test command on the compute node via slurm using:

```
srun hostname
```

This should return the name `c0`.

With these tests confirmed you have a working OpenHPC cluster running slurm.
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

## Boot the Open OnDemand node

A primary function of this project is to provide a dev/test cluster for working
with Open OnDemand.  After the cluster is up boot the ood node with:
```
vagrant up ood
```

This will provision the node and near the end of the provisioning provide several
sudo commands that need to be run on the ohpc node to register the ood node
with the cluster, ensuring data synchronization and slurm work.

After the node is provisioned (or booted) you need to work around mount issue
with NFS mounts and issue the `mount -a` command on the ood node:
```
vagrant ssh ood -c "sudo mount -a"
```

After this point you can connect to the web ui of the ood node, typically via
(the port mapping may change in your local vagrant env):

http://localhost/8080

The default user name and password for the web UI is 'vagrant'.