Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
Menu
Open sidebar
William Stonewall Monroe
horovod-environment
Commits
35977806
Commit
35977806
authored
Nov 14, 2018
by
William Stonewall Monroe
Browse files
Update README.md to use more standard markup for command line instructions
parent
bd04a2e8
Changes
1
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
35977806
...
...
@@ -13,46 +13,64 @@ cd horovod-environment
# request gpu resources (one way of doing it), this needs to be done everytime
`sinteractive --ntasks=8 --time=08:00:00 --exclusive --partition=pascalnodes -N2 --gres=gpu:4`
```
sinteractive --ntasks=8 --time=08:00:00 --exclusive --partition=pascalnodes -N2 --gres=gpu:4
```
# load modules, this needs to be done everytime
`module load Anaconda3/5.2.0`
```
module load Anaconda3/5.2.0
`
module load cuda91
`
module load cuda91
`module load OpenMPI/3.1.2-gcccuda-2018b`
module load OpenMPI/3.1.2-gcccuda-2018b
```
# create anaconda environment
Download distribLearn2.yml from this repo
`conda env create -f distribLearn2.yml --name distributedLearning`
```
conda env create -f distribLearn2.yml --name distributedLearning
```
## source activate env needs to be done everytime
`source activate distributedLearning`
```
source activate distributedLearning
```
These next 3 bits only need to be done to setup the env
`conda update automat`
```
conda update automat
`
pip uninstall horovod
`
pip uninstall horovod
`pip install --no-cache-dir horovod`
pip install --no-cache-dir horovod
```
# navigate to an example
`
This can be downloaded from https://github.com/uber/horovod
`
This can be downloaded from https://github.com/uber/horovod
`cd /data/user/blazerid/horovod-master/examples/`
```
cd /data/user/blazerid/horovod-master/examples/
`mpirun -np 8 -bind-to none -map-by slot -mca pml ob1 -mca btl_tcp_if_include ib0 python keras_mnist.py`
mpirun -np 8 -bind-to none -map-by slot -mca pml ob1 -mca btl_tcp_if_include ib0 python keras_mnist.py
```
# or run benchmarks
`git clone -b cnn_tf_v1.10_compatible https://github.com/tensorflow/benchmarks`
```
git clone -b cnn_tf_v1.10_compatible https://github.com/tensorflow/benchmarks
cd benchmarks/
`cd benchmarks/`
mpirun -np 8 -bind-to none -map-by slot -mca pml ob1 -mca btl_tcp_if_include ib0 python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64 --variable_update horovod
```
`mpirun -np 8 -bind-to none -map-by slot -mca pml ob1 -mca btl_tcp_if_include ib0 python scripts/tf_cnn_benchmarks/tf_cnn_benchmarks.py --model resnet101 --batch_size 64 --variable_update horovod`
For the resnet101 benchmark test,
running using 4 GPUs across 1 nodes gives: total images/sec: 491.34
running using 8 GPUs across 2 nodes gives: total images/sec: 915.31
running using 12 GPUs across 3 nodes gives: total images/sec: 1450.00
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment