- Allow students based on recommendation from staff/supervisor/researcher.
- Free---but we do observe accounting.
- Admit students based on recommendation from staff/supervisor/researcher.
- Free---but the system does attempt to balance load evenly among departments.
## Background II
- Two DGX-2 in the AI Cloud cluster
- Shared. We try to protect data but some things are not put in place roughly [levels 0 and 1](https://www.security.aau.dk/dataclassification/)
- One DGX-2 set a side for research with confidential/sensitive (levels 2 and 3) data.
- Sliced (vitual machines). There are projects, and more are coming with requirements on data protection.
- GPU system. CPU primary computations should be done somewhere else. [Cloud: Strato](https://strato.claaudia.aau.dk) or [uCloud](https://cloud.sdu.dk), possibly [VMWare](https://www.en.its.aau.dk/instructions/VMware).
- A lot of things are happening both in [DK](https://www.nyheder.aau.dk/2020/nyhed/ny-dansk-supercomputer-skaber-langt-mere-samfundsvaerdi.cid489812) and at EU level. HPC landscape is being reshaped. If you need something, then email CLAAUDIA@aau.dk for more information.
- Two NVIDIA DGX-2 in the AI Cloud cluster
- Shared. Users' data separated by ordinary file system access
restrictions. Not suitable for sensitive/secret data. Usable for
[levels 0 and 1](https://www.security.aau.dk/dataclassification/)
- One DGX-2 set aside for research with confidential/sensitive (levels
2 and 3) data.
- Sliced (vitual machines). There are projects, and more are coming
with requirements on data protection.
- GPU system. CPU-primary computations should be done somewhere
else. [Cloud: Strato](https://strato-new.claaudia.aau.dk) or
sudo sacctmgr modify user <user> set QOS+=deadline
```
Follow the guidelines on the documentation page and submit an email to support@its.aau.dk if you have a paper deadline.
@ -196,7 +207,7 @@ Some additional readings:
4. HPC-oriented
5. Users familar with Docker might experience slow build process.
Refs Docker vs. Singularity discussion: [ref](https://pythonspeed.com/articles/containers-filesystem-data-processing/) and [ref2](https://www.reddit.com/r/docker/comments/7y2yp2/why_is_singularity_used_as_opposed_to_docker_in/)
Refs Docker vs. Singularity discussion: [[1]](https://pythonspeed.com/articles/containers-filesystem-data-processing/) and [[2]](https://www.reddit.com/r/docker/comments/7y2yp2/why_is_singularity_used_as_opposed_to_docker_in/)
## Check built-in documentation
@ -206,7 +217,7 @@ Refs Docker vs. Singularity discussion: [ref](https://pythonspeed.com/articles/c
## Singularity build from Docker and exec command
Example: Pull a Docker image and convert to singularity image
Example: Pull a Docker image and convert to Singularity image
- View resource utilization on compute node (shh in):
- View resource utilization on compute node (ssh in):
* ```$ top -u <user>```
* ```$ smem -u -k```
* ```$ nvidia-smi -l 1 -i <IDX>``` # see scontrol -d show job <jobId>
@ -303,7 +314,7 @@ On the node:
- Data in e.g. /user/student.aau.dk/ are on a distributed file system
* Consider using /raid (SSD NVMe) on the compute node (see doc)
- If you have allocated a GPU and your job information contains ```mem=10000M``` and it is just pending (state=PD, possible reason=resources) but there should be resources.
* Issue: cancel and add e.g. --mem=64G to you allocation
* Issue: cancel and add e.g. `--mem=64G` to your allocation
## Tools, tips and tricks II
@ -338,7 +349,7 @@ We see challenges towards the end of semesters (cyclic):
- More workflows
- Copying data to the local drive for higher I/O performance