parent
e91b5fec06
commit
79165e32ad
@ -0,0 +1,2 @@ |
||||
slides: |
||||
pandoc SlurmAndSingularityTraining.md -o SlurmAndSingularityTraining.pdf -t beamer --slide-level=2 --pdf-engine=xelatex --pdf-engine-opt=-shell-escape
|
@ -0,0 +1,235 @@ |
||||
--- |
||||
title: Slurm and Singularity Traing for AI cloud |
||||
date: April 2019 |
||||
author: |
||||
- Mads Boye |
||||
- Tobias Lindstrøm Jensen |
||||
- Vang Le-Quy |
||||
|
||||
affiliation: CLAAUDIA, Aalborg University |
||||
theme: AAUsimple |
||||
aspectratio: 169 |
||||
header-includes: |
||||
- \usepackage{graphicx} |
||||
- \usepackage{amsmath} |
||||
- \usepackage{minted} |
||||
output: |
||||
beamer_presentation |
||||
--- |
||||
|
||||
## Agenda |
||||
|
||||
\tableofcontents |
||||
|
||||
# System design and intended uses |
||||
|
||||
- Job partitioning: High priority, normal. High resource use, normal, low. |
||||
- Run mode: batch, interactive |
||||
Introduction about system setup. How jobs are run. How users are supposed to use the resources. |
||||
|
||||
# Slurm basics |
||||
|
||||
## Why slurm ? |
||||
|
||||
General introduction about [slum](https://www.youtube.com/watch?v=5nxMLqF6Eu8) |
||||
|
||||
### Resource management |
||||
### Queue system |
||||
|
||||
## Query commands |
||||
|
||||
- sinfo: singularity -p batch, sinfo --Node |
||||
|
||||
`sinfo -o "%D %e %E %r %T %z" -p batch` |
||||
|
||||
- squeue: squeue -u $USER -i60 # query every 60 seconds |
||||
- smap - report system, job or step status |
||||
- sview |
||||
|
||||
## Accounting commands |
||||
|
||||
- sacct - report accounting information by individual job and job step |
||||
- sstat - report accounting information about currently running jobs and job steps (more detailed then sacct) |
||||
- sreport - report resources usage by cluster, partition, user, account, etc |
||||
|
||||
``` |
||||
sacct -A claaudia -u vle@its.aau.dk |
||||
sreport cluster AccountUtilizationByUser cluster=ngc account=claaudia start=4/01/19 end=4/17/20 format=Accounts,Cluster,TresCount,Login,Proper,Used |
||||
``` |
||||
|
||||
|
||||
## Essential commands |
||||
|
||||
- sbatch -- Submit job script (Batch mode) |
||||
- salloc -- Create job allocation and start a shell (Interactive Batch mode) |
||||
- srun – Run a command within a batch allocation that was created by sbatch or salloc |
||||
- scancel -- Delete jobs from the queue |
||||
- squeue -- View the status of jobs |
||||
- sinfo – View information on nodes and queues |
||||
- sattach - connect stdin/out/err for an existing job or job step |
||||
|
||||
## Interactive jobs |
||||
|
||||
- `salloc --time=<hh:mm:ss> --gres=gpu:2` |
||||
- `ssh <email>@nv-ai-03.srv.aau.dk` |
||||
- `srun --time=<hh:mm:ss> --gres=gpu:2` |
||||
|
||||
## Slurm Batch job script |
||||
|
||||
```bash |
||||
#!/usr/bin/env bash |
||||
#SBATCH --job-name MySlurmJob |
||||
#SBATCH --partition batch # equivalent to PBS batch |
||||
#SBATCH --mail-type=ALL # NONE, BEGIN, END, FAIL, REQUEUE, ALL TIME_LIMIT, TIME_LIMIT_90, etc |
||||
#SBATCH --mail-user=vle@its.aau.dk |
||||
#SBATCH --dependency=aftercorr:498 # More info slurm head node: `man --pager='less -p \--dependency' sbatch` |
||||
#SBATCH --time 24:00:00 # Run 24 hours |
||||
#SBATCH --gres=gpu:2 |
||||
|
||||
``` |
||||
|
||||
## Slurm Batch job script (cont') |
||||
|
||||
```bash |
||||
sbcast --force my.prog /tmp/my.prog |
||||
srun --ntasks-per-node=2 /tmp/my.prog |
||||
``` |
||||
|
||||
|
||||
## Control job status |
||||
|
||||
- scancel - singal/cancel jobs or job steps |
||||
|
||||
`scancel --user="vle@its.aau.dk" --state=pending` |
||||
|
||||
- strigger - event trigger management tools |
||||
|
||||
## Other useful commands |
||||
|
||||
- sbcast - transer file to a compute nodes allocated to a job |
||||
|
||||
# Slurm admin |
||||
|
||||
Important readings: |
||||
|
||||
- [Multifactor Priority Plugin](https://slurm.schedmd.com/priority_multifactor.html) |
||||
- [Trackable Resource](https://slurm.schedmd.com/tres.html) |
||||
- [Accounting](https://slurm.schedmd.com/accounting.html) |
||||
- [Resource Limit](https://slurm.schedmd.com/resource_limits.html) |
||||
|
||||
## Slurm admin commands |
||||
|
||||
- scontrol show job <JOBID>, admin tool: scontrol show partition |
||||
- scontrol show partition <patitionName> ' |
||||
- `scontrol write batch_script job_id optional_filename` |
||||
- `scontrol update qos=short jobid=525` |
||||
|
||||
## Slurm admin commands |
||||
|
||||
- sacctmngr - database management tool |
||||
- sprio - view factors comprising a job priority |
||||
- sshare - view current hierarchical fair-share information |
||||
- sdiag - view statistics about scheduling module operations |
||||
|
||||
## sacctmngr |
||||
|
||||
- sudo sacctmgr modify QOS normal set MaxTRESPerUser=gres/gpu=2 |
||||
- sacctmgr show qos format=name,priority,maxtresperuser,MaxWall |
||||
- `sacctmgr show assoc format=account,user,qos,tres,maxtresperuser,grptres` |
||||
|
||||
# Singularity basics |
||||
|
||||
## Why singularity |
||||
|
||||
To overcome Docker's drawbacks while still work well with Docker |
||||
|
||||
1. Security |
||||
- root access |
||||
- resource exposure |
||||
2. Compatibility with `slurm` |
||||
- resource policy |
||||
3. Simplicity |
||||
4. HPC-geared |
||||
|
||||
## Commands to learn |
||||
|
||||
`srun singularity -h` |
||||
|
||||
see singularity help <command> |
||||
|
||||
```bash |
||||
CONTAINER USAGE COMMANDS: |
||||
exec Execute a command within container |
||||
run Launch a runscript within container |
||||
shell Run a Bourne shell within container |
||||
test Launch a testscript within container |
||||
CONTAINER MANAGEMENT COMMANDS: |
||||
apps List available apps within a container |
||||
bootstrap *Deprecated* use build instead |
||||
build Build a new Singularity container |
||||
check Perform container lint checks |
||||
inspect Display container's metadata |
||||
mount Mount a Singularity container image |
||||
pull Pull a Singularity/Docker container to $PWD |
||||
``` |
||||
|
||||
## Commands to learn |
||||
|
||||
1. search |
||||
2. build |
||||
3. exec |
||||
4. inspect |
||||
5. pull |
||||
6. run |
||||
7. shell |
||||
8. image.* |
||||
9. instance.* |
||||
|
||||
## build |
||||
|
||||
::: columns |
||||
|
||||
:::: column |
||||
{height=65%} |
||||
:::: |
||||
|
||||
:::: column |
||||
` sudo singularity build \ |
||||
lolcow.sif \ |
||||
docker://godlovedc/lolcow` |
||||
:::: |
||||
|
||||
::: |
||||
|
||||
|
||||
# Scenarios/ Use cases |
||||
|
||||
## Run stock docker images |
||||
```bash |
||||
srun --gres=gpu:2 bash -c 'mkdir -p $HOME/data; |
||||
source $HOME/.bashrc; |
||||
singularity run --nv -B $HOME/data:/data \ |
||||
docker://nvcr.io/nvidia/tensorflow:19.03-py3 nvidia-smi' |
||||
``` |
||||
|
||||
## Run stock singularity images |
||||
```bash |
||||
srun --gres=gpu:1 singularity run shub:// |
||||
``` |
||||
## Build and run NVIDIA’s stock Docker images |
||||
|
||||
- Setup env variables |
||||
- build and run |
||||
|
||||
## Write Singularity definition and build |
||||
## Build and run customized Singularity images |
||||
|
||||
Singularity [ definition file ](https://www.sylabs.io/guides/3.0/user-guide/definition_files.html) |
||||
|
||||
This needs introduction about Singularity recipe, `build` command. |
||||
|
||||
# QnA |
||||
|
||||
## Questions |
||||
|
||||
- How to run distributed jobs with Singularity? |
@ -0,0 +1,160 @@ |
||||
|
||||
# This file was downloaded from |
||||
# https://raw.githubusercontent.com/pescobar/STPH-course-soft/master/Singularity |
||||
BootStrap: docker |
||||
From: ubuntu:16.04 |
||||
|
||||
%post |
||||
|
||||
# install some system deps |
||||
apt-get -y update |
||||
apt-get -y install locales curl bzip2 less unzip |
||||
# this is a X11 dep for IGV |
||||
apt-get -y install libxext6 |
||||
# tools to open PDF and HTML files |
||||
apt-get -y install firefox xpdf |
||||
# some extra devel libs |
||||
apt-get -y install zlib1g-dev libssl-dev |
||||
locale-gen en_US.UTF-8 |
||||
apt-get clean |
||||
|
||||
# download and install miniconda3 |
||||
curl -sSL -O https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh |
||||
bash Miniconda3-latest-Linux-x86_64.sh -p /opt/miniconda3 -b |
||||
rm -fr Miniconda3-latest-Linux-x86_64.sh |
||||
export PATH=/opt/miniconda3/bin:$PATH |
||||
conda update -n base conda |
||||
conda config --add channels conda-forge |
||||
conda config --add channels bioconda |
||||
|
||||
# install some bioinfo tools from Bioconda |
||||
conda install --yes -c bioconda samtools==1.7 |
||||
conda install --yes -c bioconda bwa==0.7.17 |
||||
conda install --yes -c bioconda trimmomatic==0.36 |
||||
conda install --yes -c bioconda perl-findbin==1.51 |
||||
conda install --yes -c bioconda fastqc==0.11.7 |
||||
conda install --yes -c bioconda seqprep==1.2 |
||||
conda install --yes -c bioconda gatk4==4.0.1.1 |
||||
conda install --yes -c bioconda igv=2.3.98 |
||||
conda install --yes -c bioconda vcftools==0.1.15 |
||||
conda install --yes -c bioconda snpeff=4.3.1t-0 |
||||
conda install --yes -c bioconda varscan==2.4.3 |
||||
conda install --yes -c bioconda muscle==3.8.1551 |
||||
conda install --yes -c bioconda mafft==7.313 |
||||
conda install --yes -c bioconda raxml==8.2.10 |
||||
conda install --yes -c bioconda beast==1.8.4 |
||||
conda install --yes -c bioconda phylip==3.696 |
||||
conda install --yes -c bioconda paml==4.9 |
||||
conda install --yes -c bioconda qualimap==2.2.2a |
||||
conda install --yes -c bioconda picard==2.18.3 |
||||
conda install --yes -c bioconda biopython==1.71 |
||||
|
||||
# install the R programming language |
||||
conda install --yes -c conda-forge r-base==3.4.1 |
||||
|
||||
# install some dependencies to build R packages |
||||
apt-get -y install build-essential gfortran |
||||
#conda install --yes -c conda-forge make |
||||
#conda install --yes gfortran_linux-64 |
||||
#conda install --yes gxx_linux-64 |
||||
#conda install --yes gcc_linux-64 |
||||
|
||||
# install some extra R packages |
||||
Rscript -e "source ('https://bioconductor.org/biocLite.R'); biocLite(c('ape', 'pegas', 'adegenet', 'phangorn', 'sqldf', 'ggtree', 'ggplot2', 'phytools'))" |
||||
|
||||
# install the jupyter notebook |
||||
conda install --yes jupyter |
||||
|
||||
# install R kernel for jupyter |
||||
Rscript -e "source ('https://bioconductor.org/biocLite.R'); biocLite(c('repr', 'IRdisplay', 'evaluate', 'crayon', 'pbdZMQ', 'git2r', 'devtools', 'uuid', 'digest'))" |
||||
ln -s /bin/tar /bin/gtar |
||||
Rscript -e "devtools::install_url('https://github.com/IRkernel/IRkernel/archive/0.8.11.tar.gz')" |
||||
#Rscript -e "devtools::install_github('IRkernel/IRkernel')" # this one doesnt work |
||||
Rscript -e "IRkernel::installspec(user = FALSE)" |
||||
|
||||
# install TNT |
||||
curl -sSL -O http://www.lillo.org.ar/phylogeny/tnt/tnt64.zip |
||||
unzip -p tnt64.zip tnt > /usr/local/bin/tnt |
||||
chmod +x /usr/local/bin/tnt |
||||
|
||||
# donwload and uncompress figtree to /opt/FigTree_v1.4.3/ |
||||
# also create a wrapper script in /usr/local/bin |
||||
curl -sSL -o /opt/figtree.tgz "http://tree.bio.ed.ac.uk/download.php?id=96&num=3" |
||||
tar -xvf /opt/figtree.tgz -C /opt/ |
||||
chmod +x /opt/FigTree_v1.4.3/bin/figtree |
||||
cat <<EOF >>/usr/local/bin/figtree |
||||
#!/bin/sh |
||||
cd /opt/FigTree_v1.4.3/ |
||||
java -Xms64m -Xmx512m -jar lib/figtree.jar $* |
||||
EOF |
||||
chmod +x /usr/local/bin/figtree |
||||
|
||||
%environment |
||||
export LANG=en_US.UTF-8 |
||||
export LANGUAGE=en_US:en |
||||
export LC_ALL=en_US.UTF-8 |
||||
export PATH=/opt/miniconda3/bin:$PATH |
||||
export XDG_RUNTIME_DIR="" |
||||
|
||||
%apprun samtools |
||||
samtools "$@" |
||||
|
||||
%apprun bwa |
||||
bwa "$@" |
||||
|
||||
%apprun trimmomatic |
||||
trimmomatic "$@" |
||||
|
||||
%apprun fastqc |
||||
fastqc "$@" |
||||
|
||||
%apprun seqprep |
||||
seqprep "$@" |
||||
|
||||
%apprun gatk4 |
||||
gatk-launch "$@" |
||||
|
||||
%apprun vcftools |
||||
vcftools "$@" |
||||
|
||||
%apprun snpeff |
||||
snpeff "$@" |
||||
|
||||
%apprun varscan |
||||
varscan "$@" |
||||
|
||||
%apprun muscle |
||||
varscan "$@" |
||||
|
||||
%apprun mafft |
||||
mafft "$@" |
||||
|
||||
%apprun raxml |
||||
raxmlHPC-PTHREADS "$@" |
||||
|
||||
%apprun beast |
||||
beast "$@" |
||||
|
||||
%apprun phylip |
||||
phylip "$@" |
||||
|
||||
%apprun paml |
||||
codeml "$@" |
||||
|
||||
%apprun picard |
||||
picard "$@" |
||||
|
||||
%apprun qualimap |
||||
qualimap "$@" |
||||
|
||||
%apprun R |
||||
R "$@" |
||||
|
||||
%apprun jupyter |
||||
jupyter "$@" |
||||
|
||||
%apprun tnt |
||||
tnt "$@" |
||||
|
||||
%apprun figtree |
||||
/usr/local/bin/figtree |
@ -0,0 +1,45 @@ |
||||
# Read more about this definition at https://www.sylabs.io/guides/3.0/user-guide/definition_files.html |
||||
|
||||
Bootstrap: library |
||||
From: ubuntu:18.04 |
||||
|
||||
%setup |
||||
touch /file1 |
||||
touch ${SINGULARITY_ROOTFS}/file2 |
||||
|
||||
%files |
||||
/file1 |
||||
/file1 /opt |
||||
|
||||
%environment |
||||
export LISTEN_PORT=12345 |
||||
export LC_ALL=C |
||||
|
||||
%post |
||||
apt-get update && apt-get install -y netcat |
||||
NOW=`date` |
||||
echo "export NOW=\"${NOW}\"" >> $SINGULARITY_ENVIRONMENT |
||||
|
||||
%runscript |
||||
echo "Container was created $NOW" |
||||
echo "Arguments received: $*" |
||||
exec echo "$@" |
||||
|
||||
%startscript |
||||
nc -lp $LISTEN_PORT |
||||
|
||||
%test |
||||
grep -q NAME=\"Ubuntu\" /etc/os-release |
||||
if [ $? -eq 0 ]; then |
||||
echo "Container base is Ubuntu as expected." |
||||
else |
||||
echo "Container base is not Ubuntu." |
||||
fi |
||||
|
||||
%labels |
||||
Author d@sylabs.io |
||||
Version v0.0.1 |
||||
|
||||
%help |
||||
This is a demo container used to illustrate a def file that uses all |
||||
supported sections. |
After Width: | Height: | Size: 53 KiB |
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Loading…
Reference in new issue