a bit of post training cleanup

pull/4/head
Tobias Lindstrøm Jensen 3 years ago
parent b442071072
commit 393a15d3dc
  1. 4
      aicloud_slurm/README.md
  2. 0
      aicloud_slurm/examples_Singularity_def_files/complex/Singularity.def
  3. 0
      aicloud_slurm/examples_Singularity_def_files/demo/Singularity.def
  4. 2
      aicloud_slurm/training/SlurmAndSingularityTraining.md
  5. BIN
      aicloud_slurm/training/SlurmAndSingularityTraining.pdf

@ -27,7 +27,7 @@
This README contains information for getting started with the DGX-2 system at AAU, links to additional resources, and examples for daily usage. We are many users on this system, so please consult the section on [Fair usage](#fair-usage) and follow the guidelines.
# Introduction
**It is intended that all analysis on the DGX-2 servers are run via [singularity](https://www.sylabs.io/docs/) containers which you start and manage by yourselves. It is possible to build singularity images from the NVIDIA's stock docker images, check out this [Support Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html "support matrix"). If you need something of your own taste, all software tools and their dependencies are supposed to be installed inside your containers. Furthermore, if you want to use the software stack again and again, it is a good idea to create a singularity image for that. It is highly recommended that you go through [NVIDIA Singularity Guide](https://docs.nvidia.com/ngc/ngc-user-guide/singularity.html#singularity) to get an overview about how to use Singularity, and setup your own image(s).**
**It is intended that all analysis on the DGX-2 servers are run via [singularity](https://www.sylabs.io/docs/) containers which you start and manage by yourselves. It is possible to build singularity images from the NVIDIA's stock docker images, check out this [Support Matrix](https://docs.nvidia.com/deeplearning/dgx/support-matrix/index.html "support matrix"). If you need something of your own taste, all software tools and their dependencies are supposed to be installed inside your containers. Furthermore, if you want to use the software stack again and again, it is a good idea to create a singularity image for that. You can get more information here: [NVIDIA Singularity Guide](https://docs.nvidia.com/ngc/ngc-user-guide/singularity.html#singularity) to get an overview about how to use Singularity, and setup your own image(s).**
Resource sharing is done using [slurm](https://slurm.schedmd.com/documentation.html). The setup is such that you allocate an integer number of GPUs for you computations, e.g., 1,2,3,....
@ -236,7 +236,7 @@ First well pull
srun singularity pull docker://nvcr.io/nvidia/tensorflow:19.03-py3
```
The pull address of the container can be found from the [NGC catalog](https://ngc.nvidia.com/catalog/containers?orderBy=&query=&quickFilter=deep-learning&filters=).
The pull address of the container can be found from the [NGC catalog](https://ngc.nvidia.com/catalog/all?orderBy=modifiedDESC&pageNumber=1&query=&quickFilter=&filters=).
We can then do

@ -57,7 +57,7 @@
`ssh <emailaddress>@ai-pilot.srv.aau.dk`
- Outside AAU network (external users or outside VPN)
- Outside AAU network (outside VPN)
```console
# Two-step log on

Loading…
Cancel
Save