add first version of training matterials

pull/1/head
Vang Le-Quy 4 years ago
parent e91b5fec06
commit 79165e32ad
  1. 2
      aicloud_slurm/Makefile
  2. 235
      aicloud_slurm/SlurmAndSingularityTraining.md
  3. 160
      aicloud_slurm/examples/Singularity/complex/Singularity.def
  4. 45
      aicloud_slurm/examples/Singularity/demo/Singularity.def
  5. 4
      aicloud_slurm/images/build_input_output.svg
  6. BIN
      aicloud_slurm/refs/20181204-gpu-accelerated-multi-node-hp-cworkloads-with-singularity1.pdf
  7. BIN
      aicloud_slurm/refs/Basic_Configuration_Usage.pdf
  8. BIN
      aicloud_slurm/refs/GMKurtzer_Singularity_Keynote_Tuesday_02072017.pdf
  9. BIN
      aicloud_slurm/refs/SlurmOverview.pdf
  10. BIN
      aicloud_slurm/refs/slurm_overview_slug18.pdf
  11. BIN
      aicloud_slurm/refs/slurmsummary.pdf

@ -0,0 +1,2 @@
slides:
pandoc SlurmAndSingularityTraining.md -o SlurmAndSingularityTraining.pdf -t beamer --slide-level=2 --pdf-engine=xelatex --pdf-engine-opt=-shell-escape

@ -0,0 +1,235 @@
---
title: Slurm and Singularity Traing for AI cloud
date: April 2019
author:
- Mads Boye
- Tobias Lindstrøm Jensen
- Vang Le-Quy
affiliation: CLAAUDIA, Aalborg University
theme: AAUsimple
aspectratio: 169
header-includes:
- \usepackage{graphicx}
- \usepackage{amsmath}
- \usepackage{minted}
output:
beamer_presentation
---
## Agenda
\tableofcontents
# System design and intended uses
- Job partitioning: High priority, normal. High resource use, normal, low.
- Run mode: batch, interactive
Introduction about system setup. How jobs are run. How users are supposed to use the resources.
# Slurm basics
## Why slurm ?
General introduction about [slum](https://www.youtube.com/watch?v=5nxMLqF6Eu8)
### Resource management
### Queue system
## Query commands
- sinfo: singularity -p batch, sinfo --Node
`sinfo -o "%D %e %E %r %T %z" -p batch`
- squeue: squeue -u $USER -i60 # query every 60 seconds
- smap - report system, job or step status
- sview
## Accounting commands
- sacct - report accounting information by individual job and job step
- sstat - report accounting information about currently running jobs and job steps (more detailed then sacct)
- sreport - report resources usage by cluster, partition, user, account, etc
```
sacct -A claaudia -u vle@its.aau.dk
sreport cluster AccountUtilizationByUser cluster=ngc account=claaudia start=4/01/19 end=4/17/20 format=Accounts,Cluster,TresCount,Login,Proper,Used
```
## Essential commands
- sbatch -- Submit job script (Batch mode)
- salloc -- Create job allocation and start a shell (Interactive Batch mode)
- srun – Run a command within a batch allocation that was created by sbatch or salloc
- scancel -- Delete jobs from the queue
- squeue -- View the status of jobs
- sinfo – View information on nodes and queues
- sattach - connect stdin/out/err for an existing job or job step
## Interactive jobs
- `salloc --time=<hh:mm:ss> --gres=gpu:2`
- `ssh <email>@nv-ai-03.srv.aau.dk`
- `srun --time=<hh:mm:ss> --gres=gpu:2`
## Slurm Batch job script
```bash
#!/usr/bin/env bash
#SBATCH --job-name MySlurmJob
#SBATCH --partition batch # equivalent to PBS batch
#SBATCH --mail-type=ALL # NONE, BEGIN, END, FAIL, REQUEUE, ALL TIME_LIMIT, TIME_LIMIT_90, etc
#SBATCH --mail-user=vle@its.aau.dk
#SBATCH --dependency=aftercorr:498 # More info slurm head node: `man --pager='less -p \--dependency' sbatch`
#SBATCH --time 24:00:00 # Run 24 hours
#SBATCH --gres=gpu:2
```
## Slurm Batch job script (cont')
```bash
sbcast --force my.prog /tmp/my.prog
srun --ntasks-per-node=2 /tmp/my.prog
```
## Control job status
- scancel - singal/cancel jobs or job steps
`scancel --user="vle@its.aau.dk" --state=pending`
- strigger - event trigger management tools
## Other useful commands
- sbcast - transer file to a compute nodes allocated to a job
# Slurm admin
Important readings:
- [Multifactor Priority Plugin](https://slurm.schedmd.com/priority_multifactor.html)
- [Trackable Resource](https://slurm.schedmd.com/tres.html)
- [Accounting](https://slurm.schedmd.com/accounting.html)
- [Resource Limit](https://slurm.schedmd.com/resource_limits.html)
## Slurm admin commands
- scontrol show job <JOBID>, admin tool: scontrol show partition
- scontrol show partition <patitionName> '
- `scontrol write batch_script job_id optional_filename`
- `scontrol update qos=short jobid=525`
## Slurm admin commands
- sacctmngr - database management tool
- sprio - view factors comprising a job priority
- sshare - view current hierarchical fair-share information
- sdiag - view statistics about scheduling module operations
## sacctmngr
- sudo sacctmgr modify QOS normal set MaxTRESPerUser=gres/gpu=2
- sacctmgr show qos format=name,priority,maxtresperuser,MaxWall
- `sacctmgr show assoc format=account,user,qos,tres,maxtresperuser,grptres`
# Singularity basics
## Why singularity
To overcome Docker's drawbacks while still work well with Docker
1. Security
- root access
- resource exposure
2. Compatibility with `slurm`
- resource policy
3. Simplicity
4. HPC-geared
## Commands to learn
`srun singularity -h`
see singularity help <command>
```bash
CONTAINER USAGE COMMANDS:
exec Execute a command within container
run Launch a runscript within container
shell Run a Bourne shell within container
test Launch a testscript within container
CONTAINER MANAGEMENT COMMANDS:
apps List available apps within a container
bootstrap *Deprecated* use build instead
build Build a new Singularity container
check Perform container lint checks
inspect Display container's metadata
mount Mount a Singularity container image
pull Pull a Singularity/Docker container to $PWD
```
## Commands to learn
1. search
2. build
3. exec
4. inspect
5. pull
6. run
7. shell
8. image.*
9. instance.*
## build
::: columns
:::: column
![Build IO](images/build_input_output.svg "Build Input Ouput"){height=65%}
::::
:::: column
` sudo singularity build \
lolcow.sif \
docker://godlovedc/lolcow`
::::
:::
# Scenarios/ Use cases
## Run stock docker images
```bash
srun --gres=gpu:2 bash -c 'mkdir -p $HOME/data;
source $HOME/.bashrc;
singularity run --nv -B $HOME/data:/data \
docker://nvcr.io/nvidia/tensorflow:19.03-py3 nvidia-smi'
```
## Run stock singularity images
```bash
srun --gres=gpu:1 singularity run shub://
```
## Build and run NVIDIA’s stock Docker images
- Setup env variables
- build and run
## Write Singularity definition and build
## Build and run customized Singularity images
Singularity [ definition file ](https://www.sylabs.io/guides/3.0/user-guide/definition_files.html)
This needs introduction about Singularity recipe, `build` command.
# QnA
## Questions
- How to run distributed jobs with Singularity?

@ -0,0 +1,160 @@
# This file was downloaded from
# https://raw.githubusercontent.com/pescobar/STPH-course-soft/master/Singularity
BootStrap: docker
From: ubuntu:16.04
%post
# install some system deps
apt-get -y update
apt-get -y install locales curl bzip2 less unzip
# this is a X11 dep for IGV
apt-get -y install libxext6
# tools to open PDF and HTML files
apt-get -y install firefox xpdf
# some extra devel libs
apt-get -y install zlib1g-dev libssl-dev
locale-gen en_US.UTF-8
apt-get clean
# download and install miniconda3
curl -sSL -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh -p /opt/miniconda3 -b
rm -fr Miniconda3-latest-Linux-x86_64.sh
export PATH=/opt/miniconda3/bin:$PATH
conda update -n base conda
conda config --add channels conda-forge
conda config --add channels bioconda
# install some bioinfo tools from Bioconda
conda install --yes -c bioconda samtools==1.7
conda install --yes -c bioconda bwa==0.7.17
conda install --yes -c bioconda trimmomatic==0.36
conda install --yes -c bioconda perl-findbin==1.51
conda install --yes -c bioconda fastqc==0.11.7
conda install --yes -c bioconda seqprep==1.2
conda install --yes -c bioconda gatk4==4.0.1.1
conda install --yes -c bioconda igv=2.3.98
conda install --yes -c bioconda vcftools==0.1.15
conda install --yes -c bioconda snpeff=4.3.1t-0
conda install --yes -c bioconda varscan==2.4.3
conda install --yes -c bioconda muscle==3.8.1551
conda install --yes -c bioconda mafft==7.313
conda install --yes -c bioconda raxml==8.2.10
conda install --yes -c bioconda beast==1.8.4
conda install --yes -c bioconda phylip==3.696
conda install --yes -c bioconda paml==4.9
conda install --yes -c bioconda qualimap==2.2.2a
conda install --yes -c bioconda picard==2.18.3
conda install --yes -c bioconda biopython==1.71
# install the R programming language
conda install --yes -c conda-forge r-base==3.4.1
# install some dependencies to build R packages
apt-get -y install build-essential gfortran
#conda install --yes -c conda-forge make
#conda install --yes gfortran_linux-64
#conda install --yes gxx_linux-64
#conda install --yes gcc_linux-64
# install some extra R packages
Rscript -e "source ('https://bioconductor.org/biocLite.R'); biocLite(c('ape', 'pegas', 'adegenet', 'phangorn', 'sqldf', 'ggtree', 'ggplot2', 'phytools'))"
# install the jupyter notebook
conda install --yes jupyter
# install R kernel for jupyter
Rscript -e "source ('https://bioconductor.org/biocLite.R'); biocLite(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'git2r', 'devtools', 'uuid', 'digest'))"
ln -s /bin/tar /bin/gtar
Rscript -e "devtools::install_url('https://github.com/IRkernel/IRkernel/archive/0.8.11.tar.gz')"
#Rscript -e "devtools::install_github('IRkernel/IRkernel')" # this one doesnt work
Rscript -e "IRkernel::installspec(user = FALSE)"
# install TNT
curl -sSL -O http://www.lillo.org.ar/phylogeny/tnt/tnt64.zip
unzip -p tnt64.zip tnt > /usr/local/bin/tnt
chmod +x /usr/local/bin/tnt
# donwload and uncompress figtree to /opt/FigTree_v1.4.3/
# also create a wrapper script in /usr/local/bin
curl -sSL -o /opt/figtree.tgz "http://tree.bio.ed.ac.uk/download.php?id=96&num=3"
tar -xvf /opt/figtree.tgz -C /opt/
chmod +x /opt/FigTree_v1.4.3/bin/figtree
cat <<EOF >>/usr/local/bin/figtree
#!/bin/sh
cd /opt/FigTree_v1.4.3/
java -Xms64m -Xmx512m -jar lib/figtree.jar $*
EOF
chmod +x /usr/local/bin/figtree
%environment
export LANG=en_US.UTF-8
export LANGUAGE=en_US:en
export LC_ALL=en_US.UTF-8
export PATH=/opt/miniconda3/bin:$PATH
export XDG_RUNTIME_DIR=""
%apprun samtools
samtools "$@"
%apprun bwa
bwa "$@"
%apprun trimmomatic
trimmomatic "$@"
%apprun fastqc
fastqc "$@"
%apprun seqprep
seqprep "$@"
%apprun gatk4
gatk-launch "$@"
%apprun vcftools
vcftools "$@"
%apprun snpeff
snpeff "$@"
%apprun varscan
varscan "$@"
%apprun muscle
varscan "$@"
%apprun mafft
mafft "$@"
%apprun raxml
raxmlHPC-PTHREADS "$@"
%apprun beast
beast "$@"
%apprun phylip
phylip "$@"
%apprun paml
codeml "$@"
%apprun picard
picard "$@"
%apprun qualimap
qualimap "$@"
%apprun R
R "$@"
%apprun jupyter
jupyter "$@"
%apprun tnt
tnt "$@"
%apprun figtree
/usr/local/bin/figtree

@ -0,0 +1,45 @@
# Read more about this definition at https://www.sylabs.io/guides/3.0/user-guide/definition_files.html
Bootstrap: library
From: ubuntu:18.04
%setup
touch /file1
touch ${SINGULARITY_ROOTFS}/file2
%files
/file1
/file1 /opt
%environment
export LISTEN_PORT=12345
export LC_ALL=C
%post
apt-get update && apt-get install -y netcat
NOW=`date`
echo "export NOW=\"${NOW}\"" >> $SINGULARITY_ENVIRONMENT
%runscript
echo "Container was created $NOW"
echo "Arguments received: $*"
exec echo "$@"
%startscript
nc -lp $LISTEN_PORT
%test
grep -q NAME=\"Ubuntu\" /etc/os-release
if [ $? -eq 0 ]; then
echo "Container base is Ubuntu as expected."
else
echo "Container base is not Ubuntu."
fi
%labels
Author d@sylabs.io
Version v0.0.1
%help
This is a demo container used to illustrate a def file that uses all
supported sections.

File diff suppressed because one or more lines are too long

After

Width:  |  Height:  |  Size: 53 KiB

Loading…
Cancel
Save