singularity_container.ipynb 15.2 KB
Newer Older
1
2
3
4
5
6
7
8
9
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# SIngularity Containers"
   ]
  },
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## First container\n",
    "\n",
    "Let's begin by pulling our first container from [Singularity-Hub](https://singularity-hub.org/).\n",
    "\n",
    "This singularity image contains the tool [neurodebian](http://neuro.debian.net/).\n",
    "\n",
    "NeuroDebian provides a large collection of popular neuroscience research software for the Debian operating system as well as Ubuntu and other derivatives."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity pull shub://neurodebian/neurodebian"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now that we have pulled the image you should be able to see that image in your directory by simply running a 'ls' command"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!ls"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, we'll try to execute a command from within the container. \n",
    "__exec__ parameter allows you to achieve this functionality. Let's list the content of you /home/$USER directory"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity exec neurodebian-neurodebian-master-latest.simg ls /home/$USER"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Now, that we have the container pulled, we can use it for running the actual tool that we wanted to use, the reason for actually downloading the container in the first place: [dcm2nii](https://www.nitrc.org/projects/mricron)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity exec -B /data neurodebian-neurodebian-master-latest.simg dcm2nii"
   ]
  },
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## What is a container?"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings.\n",
    "\n",
    "Containers use the host system's kernel, and can access its hardware more directly. When you run a process in a container it runs on the host system, directly visible as a process on that system. Unlike a Virtual Machine, container is a virtualization at the software level, whereas VMs are virtualization at hardware level. If you are interested in finding out more differences between VM and a container, go to this [link](https://www.electronicdesign.com/dev-tools/what-s-difference-between-containers-and-virtual-machines)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Why use a container?\n",
    "\n",
    "Containers package together a program and all of its dependencies, so that if you have the container you can use it on any Linux system with the container system software installed. It doesn't matter whether the system runs Ubuntu, RedHat or CentOS Linux - if the container system is available then the program runs identically on each, inside its container. This is great for distributing complex software with a lot of dependencies, and ensuring you can reproduce experiments exactly. If you still have the container you know you can reproduce your work. Also since the container runs as a process on the host machine, it can be run very easily in a [SLURM job](https://docs.uabgrid.uab.edu/wiki/Slurm)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Docker vs Singularity?\n",
    "\n",
112
113
114
115
116
    "[Docker](https://www.docker.com/) is the most popular and widely used container system in the industry. But [Singularity](https://www.sylabs.io/singularity/) was built keeping HPC in mind, i.e a shared environment. Singularity is designed so that you can use it within SLURM jobs and it does not violate security constraints on the cluster. Though, since Docker is very popular and a lot of people were already using the Docker for their softwares, Singularity maintained a compatibility for Docker images. We'll be seeing this compatibility later in the notebook. Both Singularity and Docker maintain a hub where you can keep your docker remotely, and pull them from anywhere. Here is a link for both the hubs:\n",
    "\n",
    "[Singularity-Hub](https://singularity-hub.org)  \n",
    "[Docker Hub](https://hub.docker.com/)\n",
    "\n",
117
118
119
120
121
122
123
124
125
126
127
128
    "\n",
    "Singularity is already available on Cheaha. To check the available modules for Cheaha, run the cell below:\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "scrolled": true
   },
   "outputs": [],
   "source": [
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
    "!module avail Singularity"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "As you might have already noticed that we already loaded a Singularity module while starting up this notebook. You can check the version of the Singularity loaded below:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity --version"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Basic singularity command line functions:\n",
    "\n",
    "To check the basic functions or command line options provided run help on the singularity "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity --help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To check more information about a particular parameter, use help in conjunction with that parameter"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
179
180
181
182
183
184
185
    "!singularity pull help"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
186
    "## Namespace resolution within the container:"
187
188
189
190
191
192
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
193
    "Now, let's list the content of your /data/user/$USER directory by running it from within the container. We'll use __exec__ parameter for this."
194
195
196
197
198
199
200
201
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
202
    "!singularity exec neurodebian-neurodebian-master-latest.simg ls /data/user/$USER"
203
204
205
206
207
208
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
209
210
211
212
213
    "Hmmm, an error. Remember your singularity container image doesn't know about the directories on your host machine. It by default (in most containers) binds your HOME and tmp directory. \n",
    "\n",
    "Now, all our raw data is generally in our /data/user/$USER locations, so we really need to access that location if our container has to be useful. \n",
    "\n",
    "Thankfully, you can explicitly tell singularity to bind a host directory to your container image. Singularity provides you with a parameter (-B) to bind path from your host machine to the container. Try the same command again, but with the bind parameter"
214
215
216
217
218
219
220
221
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
222
    "!singularity exec -B /data/user/$USER neurodebian-neurodebian-master-latest.simg ls /data/user/$USER"
223
224
225
226
227
228
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
229
    "Now like mentioned earlier during the security considerations of Singularity in a HPC environment, all the sigularity runs adhere to the user level permissions, from the host system. So I would get a permission denied issue if I try to list William's directory content."
230
231
232
233
234
235
236
237
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
238
    "!singularity exec -B /data/user/wsmonroe neurodebian-neurodebian-master-latest.simg ls /data/user/wsmonroe"
239
240
241
242
243
244
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
245
    "Now that we know how to bind paths from the host machine to the container image, I would be able to use __dcm2nii__ tool with my raw files available in my DATA_USER location.\n",
246
    "\n",
247
248
    "\n",
    "We'll take a look at how to use it within a job script in the next section."
249
250
251
252
253
254
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
255
256
257
    "## Job Script with containers\n",
    "\n",
    "Using Singularity container with [SLURM job script](https://docs.uabgrid.uab.edu/wiki/Slurm) is very easy, as the containers run as a process on the host machine. You just need to load Singularity in your job script and run the singularity process. Here's an example job script below:"
258
259
260
   ]
  },
  {
261
   "cell_type": "markdown",
262
263
   "metadata": {},
   "source": [
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
    "```\n",
    "#!/bin/bash\n",
    "#\n",
    "#SBATCH --job-name=test\n",
    "#SBATCH --output=res.out\n",
    "#SBATCH --error=res.err\n",
    "#\n",
    "# Number of tasks needed for this job. Generally, used with MPI jobs\n",
    "#SBATCH --ntasks=1\n",
    "#SBATCH --partition=express\n",
    "#\n",
    "# Time format = HH:MM:SS, DD-HH:MM:SS\n",
    "#SBATCH --time=10:00\n",
    "#\n",
    "# Number of CPUs allocated to each task. \n",
    "#SBATCH --cpus-per-task=1\n",
    "#\n",
    "# Mimimum memory required per allocated  CPU  in  MegaBytes. \n",
    "#SBATCH --mem-per-cpu=100\n",
    "#\n",
    "# Send mail to the email address when the job fails\n",
    "#SBATCH --mail-type=FAIL\n",
    "#SBATCH --mail-user=$USER@uab.edu\n",
    "\n",
    "#Set your environment here\n",
    "module load Singularity/2.5.2-GCC-5.4.0-2.26\n",
    "\n",
    "#Run your commands here\n",
    "singularity exec -B /data/user/$USER /data/user/$USER/rc-training-sessions/neurodebian-neurodebian-master-latest.simg dcm2nii PATH_TO_YOUR_DICOM_FILES\n",
    "```"
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 2:\n",
    "\n",
    "In this example we are going to be pulling a singularity image from [dockerhub](https://hub.docker.com/). This singularity image contains [google-cloud-sdk tools](https://cloud.google.com/sdk/).\n",
    "\n",
    "The Cloud SDK is a set of tools for Cloud Platform. It contains gcloud, gsutil, and bq command-line tools, which you can use to access Google Compute Engine, Google Cloud Storage, Google BigQuery, and other products and services from the command-line. You can run these tools interactively or in your automated scripts."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity pull docker://jess/gcloud"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!singularity exec -B /data gcloud.simg gsutil"
   ]
  },
325
326
327
328
329
330
331
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "To clean your directory of all the container images, you can run the command below"
   ]
  },
332
333
334
335
336
337
338
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "!rm *.simg"
339
   ]
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Example 3:\n",
    "\n",
    "[NVIDIA GPU Cloud](https://www.nvidia.com/en-us/gpu-cloud/) (NGC) offers a container registry of Docker images with over 35 HPC, HPC visualization, deep learning, and data analytics containers optimized for GPUs and delivering accelerated performance (figure 3). The registry includes some of the most popular applications including GROMACS, NAMD, ParaView, VMD, and TensorFlow.\n",
    "\n",
    "For this example you would have to start a new Jupyter Notebook session on the 'pascalnodes' partition. Most of the settings are similar to the ones that you had in the [Git repo](https://gitlab.rc.uab.edu/rc-training-sessions/singularity_containers) for this session. you just need to add/modify the following things.\n",
    "\n",
    "In the Environment parameter add\n",
    "```\n",
    "module load cuda92/toolkit/9.2.88\n",
    "module load CUDA/9.2.88-GCC-7.3.0-2.30\n",
    "module load Singularity/2.5.2-GCC-5.4.0-2.26\n",
    "module load Anaconda3\n",
    "```\n",
    "\n",
    "And in the partition choose:\n",
    "```\n",
    "pascalnodes\n",
    "```\n",
    "\n",
    "You will need to create a new account on Nvidia GPU Cloud, to pull down these conatiners."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
373
    "!singularity build vmc_gpu.simg docker://nvcr.io/hpc/vmd:cuda9-ubuntu1604-egl-1.9.4a17"
374
375
376
377
378
379
380
381
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
382
    "!singularity exec --nv vmc_gpu.simg /opt/vmd/bin/vmd -dispdev openglpbuffer -e hiv-simple-egloptix-test.vmd"
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Building a Singularity container recipe file\n",
    "\n",
    "Though building a recipe file for Singularity containers is beyond the scope of this session, we have provided a few important links below which would tell you how to create a recipe file for Singularity containers as well as build the container using them.\n",
    "\n",
    "When you want to create a container for production use on the cluster, you should build a container image from a definition file. Unfortunately, building containers from a definition file requires you to be a system administrator (root) on the machine you use for building. You will need to build Singularity containers on a machine that you control.\n",
    "\n",
    "To install Singularity on your system, follow the steps outlined here:  \n",
    "http://singularity.lbl.gov/install-linux  \n",
    "http://singularity.lbl.gov/install-mac  \n",
    "http://singularity.lbl.gov/install-windows  \n",
    "\n",
    "Method of creating a Singularity recipe file:  \n",
    "http://singularity.lbl.gov/docs-recipes\n",
    "\n",
    "Method of building from a singularity recipe file:  \n",
    "http://singularity.lbl.gov/docs-build-container#building-containers-from-singularity-recipe-files\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Acknowledgments / Useful Links:\n",
    "\n",
    "https://github.com/singularityhub/singularityhub.github.io/wiki  \n",
    "https://portal.biohpc.swmed.edu/content/guides/singularity-containers-biohpc/  \n",
    "https://www.docker.com/  \n",
    "https://devblogs.nvidia.com/docker-compatibility-singularity-hpc/  "
   ]
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.7.2"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}