Added example with Native Torch

pull/4/head
tlj 2 years ago
parent e4b23dffeb
commit 9ebffc77bf
  1. 2
      aicloud_slurm/README.md
  2. 87
      aicloud_slurm/torch_amp_example/README.md
  3. 2
      aicloud_slurm/torch_amp_example/job.sh
  4. 0
      aicloud_slurm/torch_amp_example/torch_amp_example.py

@ -579,7 +579,7 @@ The NVIDIA V100 comes with specialized hardware for tensor operations called ten
1. Faster execution
2. Lower memory footprint that allows for an increased batch size.
An example on how to adapt your PyTorch code is provided [here](https://git.its.aau.dk/CLAAUDIA/docs_aicloud/src/branch/master/aicloud_slurm/torch_apex_example). The example uses [APEX](https://nvidia.github.io/apex/) automatic multi precision [AMP](https://nvidia.github.io/apex/amp.html).
An example on how to adapt your PyTorch code is provided [here](https://git.its.aau.dk/CLAAUDIA/docs_aicloud/src/branch/master/aicloud_slurm/torch_amp_example). The example uses [APEX](https://nvidia.github.io/apex/) automatic multi precision [AMP](https://nvidia.github.io/apex/amp.html) and native [Torch AMP](https://pytorch.org/docs/stable/amp.html) available in NGC from version 20.06.
## Tensorflow with spyder3 GUI

@ -1,11 +1,11 @@
## PyTorch and automatic mixed precision with APEX
The following is an example of using automatic mixed precision [(AMP)](https://nvidia.github.io/apex/amp.html) for PyTorch with [APEX](https://nvidia.github.io/apex/). The benefits in general are:
The following is an example of using automatic mixed precision [(AMP)](https://nvidia.github.io/apex/amp.html) for PyTorch with [APEX](https://nvidia.github.io/apex/) and and native [Torch AMP](https://pytorch.org/docs/stable/amp.html) available in NGC from version 20.06. The benefits in general are:
1. Faster computations due to the introduction of half-precision floats and tensor core operations with e.g. V100 GPUs.
2. Larger batch size as the loss, cache and gradients can be saved at a lower precision.
For more information, see the [training neural networks with tensor cores](https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf) or these [videos on mixed precision training](https://developer.nvidia.com/blog/video-mixed-precision-techniques-tensor-cores-deep-learning/).
For more information, see the [training neural networks with tensor cores](https://nvlabs.github.io/eccv2020-mixed-precision-tutorial/files/dusan_stosic-training-neural-networks-with-tensor-cores.pdf) which presents two methods for doing AMP that we use below. For more information see also these [videos on mixed precision training](https://developer.nvidia.com/blog/video-mixed-precision-techniques-tensor-cores-deep-learning/).
The following example should be seen as how to approach AMP. The solution to the given problem can be computed more easily using linear least-squares and we use this for validating the results. The example is from the PyTorch [Documentation](https://pytorch.org/tutorials/beginner/pytorch_with_examples.html)
@ -20,7 +20,15 @@ device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print('Using device:', device)
def compute(use_amp=False, iterations=5000, verbose=False):
def compute(amp_type='None', iterations=5000, verbose=False):
"""
amt_type:
'apex': use AMP from the APEX package
'native': use AMP from the Torch package
'none': do not use AMP
"""
# Create Tensors to hold input and outputs.
x = torch.linspace(-np.pi, np.pi, 2000).to(device)
y = torch.sin(x).to(device)
@ -41,15 +49,21 @@ def compute(use_amp=False, iterations=5000, verbose=False):
# Create optimizer
optimizer = torch.optim.RMSprop(model.parameters(), lr=1e-3)
# Make model and optimizer AMP models and optimizers
if use_amp:
if amp_type == 'apex':
# Make model and optimizer AMP models and optimizers
model, optimizer = amp.initialize(model, optimizer)
elif amp_type == 'native':
scaler = torch.cuda.amp.GradScaler()
for t in range(iterations):
# Forward pass: compute predicted y by passing x to the model.
y_pred = model(xx)
loss = loss_fn(y_pred, y)
if amp_type == 'native':
with torch.cuda.amp.autocast():
y_pred = model(xx)
loss = loss_fn(y_pred, y)
else:
y_pred = model(xx)
loss = loss_fn(y_pred, y)
# Compute and print loss.
if verbose:
@ -60,15 +74,21 @@ def compute(use_amp=False, iterations=5000, verbose=False):
# Backward pass: compute gradient of the loss with respect to model
# parameters using AMP. Substitutes loss.backward() in other models
if use_amp:
if amp_type == 'apex':
with amp.scale_loss(loss, optimizer) as scaled_loss:
scaled_loss.backward()
else:
optimizer.step()
elif amp_type == 'native':
scaler.scale(loss).backward()
scaler.step(optimizer)
scaler.update()
elif amp_type == 'none':
loss.backward()
# Calling the step function on an optimizer makes an update to its
# parameters
optimizer.step()
optimizer.step()
else:
print(f'No such option amp_type={amp_type}')
raise ValueError
return model[0], loss.item()
@ -84,20 +104,23 @@ def display(model_name, loss, p):
print(f'{model_name}: MSE loss = {loss:.2e}')
print(f'{model_name}: y = {p[0]:.2e} + {p[1]:.2e} x + {p[2]:.2e} x^2 + {p[3]:.2e} x^3')
without_amp, without_amp_loss = compute(use_amp=False)
with_amp, with_amp_loss = compute(use_amp=True)
without_amp, without_amp_loss = compute(amp_type='none')
with_amp_native, with_amp_native_loss = compute(amp_type='native')
with_amp_apex, with_amp_apex_loss = compute(amp_type='apex')
ls, ls_loss = computeLS()
display("Torch with amp ", with_amp_loss, [with_amp.bias.item(), with_amp.weight[:, 0].item(),
with_amp.weight[:, 1].item(), with_amp.weight[:, 2].item()])
display("Torch without amp", without_amp_loss, [without_amp.bias.item(), without_amp.weight[:, 0].item(),
display("Torch with amp apex ", with_amp_apex_loss, [with_amp_apex.bias.item(), with_amp_apex.weight[:, 0].item(),
with_amp_apex.weight[:, 1].item(), with_amp_apex.weight[:, 2].item()])
display("Torch with amp native", with_amp_native_loss, [with_amp_native.bias.item(), with_amp_native.weight[:, 0].item(),
with_amp_native.weight[:, 1].item(), with_amp_native.weight[:, 2].item()])
display("Torch without amp ", without_amp_loss, [without_amp.bias.item(), without_amp.weight[:, 0].item(),
without_amp.weight[:, 1].item(), without_amp.weight[:, 2].item()])
display("LS model ", ls_loss, ls)
display("LS model ", ls_loss, ls)
```
Notice the import and the two lines that needs to be different to make AMP work in this example. The [PyTorch containers from NGC](https://docs.nvidia.com/deeplearning/frameworks/pytorch-release-notes/rel_20-11.html#rel_20-11) comes with APEX. If you run this, using e.g. the slurm batch script job.sh, you should obtain the following output:
Notive the changes at particular parts of the code due to the usage of different AMP approaches (and no AMP)
```console
Using device: cuda:0
@ -136,14 +159,18 @@ Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 8.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 4.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 2.0
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 1.0
Torch with amp : MSE loss = 8.89e+00
Torch with amp : y = -5.01e-04 + 8.57e-01 x + -5.01e-04 x^2 + -9.30e-02 x^3
Torch without amp: MSE loss = 8.92e+00
Torch without amp: y = 5.00e-04 + 8.56e-01 x + 5.00e-04 x^2 + -9.38e-02 x^3
LS model : MSE loss = 8.82e+00
LS model : y = -5.91e-18 + 8.57e-01 x + 0.00e+00 x^2 + -9.33e-02 x^3
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.5
Gradient overflow. Skipping step, loss scaler 0 reducing loss scale to 0.25
Torch with amp apex : MSE loss = 8.86e+00
Torch with amp apex : y = 4.94e-04 + 8.57e-01 x + 4.99e-04 x^2 + -9.37e-02 x^3
Torch with amp native: MSE loss = 8.85e+00
Torch with amp native: y = 4.97e-04 + 8.57e-01 x + 4.98e-04 x^2 + -9.35e-02 x^3
Torch without amp : MSE loss = 8.92e+00
Torch without amp : y = 5.00e-04 + 8.57e-01 x + 5.00e-04 x^2 + -9.28e-02 x^3
LS model : MSE loss = 8.82e+00
LS model : y = -5.91e-18 + 8.57e-01 x + 0.00e+00 x^2 + -9.33e-02 x^3
```
Notice the final accuracy of Torch with and without AMP are comparable, but slightly less accurate than the exact linear least squares solution here used for validation.
Notice the final accuracy of Torch with and without AMP methods are comparable, but slightly less accurate than the exact linear least squares solution here used for validation.
It is unclear if we are actually using tensor cores in this example, but now the code is structured such that more advanced NN models can use tensor cores using the above recipe.

@ -3,4 +3,4 @@
#SBATCH --gres=gpu:1 #commented out
#SBATCH --qos=short # possible values: short, normal, allgpus
#SBATCH --mem=10G
srun singularity exec pytorch_20.03-py3.sif python torch_apex_example.py
srun singularity exec pytorch_20.11-py3.sif python torch_amp_example.py
Loading…
Cancel
Save