# GPUs

In [1]:
!nvidia-smi

Thu Feb 21 18:44:31 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 396.37                 Driver Version: 396.37                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|   0  Tesla M60           Off  | 00000000:00:1D.0 Off |                    0 |
| N/A   29C    P0    43W / 150W |      0MiB /  7618MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla M60           Off  | 00000000:00:1E.0 Off |                    0 |
| N/A   39C    P0    42W / 150W |      0MiB /  7618MiB |     97%      Default |
+-------------------------------+----------------------+----------------------+
                                                                            

## Computing Devices


In [2]:
import mxnet as mx
from mxnet import nd
from mxnet.gluon import nn

mx.cpu(), mx.gpu(), mx.gpu(1)

(cpu(0), gpu(0), gpu(1))

## NDArray and GPUs


In [3]:
x = nd.array([1, 2, 3])
x.context

cpu(0)

### Storage on the GPU


In [4]:
x = nd.ones((2, 3), ctx=mx.gpu())
x


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @gpu(0)>

Create on the second GPU:

In [5]:
y = nd.random.uniform(shape=(2, 3), ctx=mx.gpu(1))
y


[[0.59119    0.313164   0.76352036]
 [0.9731786  0.35454726 0.11677533]]
<NDArray 2x3 @gpu(1)>

### Copy with `copyto`

Inputs for an operator should be on the same device. 

![Copyto copies arrays to the target device](http://d2l.ai/_images/copyto.svg)

In [6]:
z = x.copyto(mx.gpu(1))
y + z


[[1.59119   1.313164  1.7635204]
 [1.9731786 1.3545473 1.1167753]]
<NDArray 2x3 @gpu(1)>

### Copy with `as_in_context`

In [7]:
z = x.as_in_context(mx.gpu(1))
z


[[1. 1. 1.]
 [1. 1. 1.]]
<NDArray 2x3 @gpu(1)>

### Tiny Difference between `copyto` and  `as_in_context` 

In [8]:
# Return the input if the target device is same as the source device
y.as_in_context(mx.gpu(1)) is y

True

In [9]:
# Always create new memory to copy the input
y.copyto(mx.gpu()) is y

False

## Gluon and GPUs

In [10]:
net = nn.Sequential()
net.add(nn.Dense(1))
net.initialize(ctx=mx.gpu())

# When the input is an NDArray on the GPU, 
# Gluon will calculate the result on the same GPU.
print(net(x))
net[0].weight.data()


[[0.04995865]
 [0.04995865]]
<NDArray 2x1 @gpu(0)>



[[0.0068339  0.01299825 0.0301265 ]]
<NDArray 1x3 @gpu(0)>