NDArray
is the basic vectorized operation unit in MXNet for matrix and tensor computations.
Users can perform usual calculations as on an R"s array, but with two additional features:
Multiple devices: All operations can be run on various devices including CPUs and GPUs.
Automatic parallelization: All operations are automatically executed in parallel with each other.
Let"s create NDArray
on either a GPU or a CPU:
require(mxnet)
## Loading required package: mxnet
## Loading required package: methods
a <- mx.nd.zeros(c(2, 3)) # create a 2-by-3 matrix on cpu
b <- mx.nd.zeros(c(2, 3), mx.cpu()) # create a 2-by-3 matrix on cpu
# c <- mx.nd.zeros(c(2, 3), mx.gpu(0)) # create a 2-by-3 matrix on gpu 0, if you have CUDA enabled.
Typically for CUDA-enabled devices, the device id of a GPU starts from 0. That's why we passed in 0 to the GPU id.
We can initialize an NDArray
object in various ways:
a <- mx.nd.ones(c(4, 4))
b <- mx.rnorm(c(4, 5))
c <- mx.nd.array(1:5)
To check the numbers in an NDArray
, we can simply run:
a <- mx.nd.ones(c(2, 3))
b <- as.array(a)
class(b)
## [1] "matrix"
b
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
You can perform elemental-wise operations on NDArray
objects, as follows:
a <- mx.nd.ones(c(2, 4)) * 2
b <- mx.nd.ones(c(2, 4)) / 8
as.array(a)
## [,1] [,2] [,3] [,4]
## [1,] 2 2 2 2
## [2,] 2 2 2 2
as.array(b)
## [,1] [,2] [,3] [,4]
## [1,] 0.125 0.125 0.125 0.125
## [2,] 0.125 0.125 0.125 0.125
c <- a + b
as.array(c)
## [,1] [,2] [,3] [,4]
## [1,] 2.125 2.125 2.125 2.125
## [2,] 2.125 2.125 2.125 2.125
d <- c / a - 5
as.array(d)
## [,1] [,2] [,3] [,4]
## [1,] -3.9375 -3.9375 -3.9375 -3.9375
## [2,] -3.9375 -3.9375 -3.9375 -3.9375
If two NDArray
s are located on different devices, we need to explicitly move them to the same one. For instance:
a <- mx.nd.ones(c(2, 3)) * 2
b <- mx.nd.ones(c(2, 3), mx.gpu()) / 8
c <- mx.nd.copyto(a, mx.gpu()) * b
as.array(c)
You can save a list of NDArray
object to your disk with mx.nd.save
:
a <- mx.nd.ones(c(2, 3))
mx.nd.save(list(a), "temp.ndarray")
You can load it back easily:
a <- mx.nd.load("temp.ndarray")
as.array(a[[1]])
## [,1] [,2] [,3]
## [1,] 1 1 1
## [2,] 1 1 1
We can directly save data to and load it from a distributed file system, such as Amazon S3 and HDFS:
mx.nd.save(list(a), "s3://mybucket/mydata.bin")
mx.nd.save(list(a), "hdfs///users/myname/mydata.bin")
NDArray
can automatically execute operations in parallel. Automatic parallelization is useful when
using multiple resources, such as CPU cards, GPU cards, and CPU-to-GPU memory bandwidth.
For example, if we write a <- a + 1
followed by b <- b + 1
, and a
is on a CPU and
b
is on a GPU, executing them in parallel improves
efficiency. Furthermore, because copying data between CPUs and GPUs are also expensive, running in parallel with other computations further increases efficiency.
It's hard to find the code that can be executed in parallel by eye. In the
following example, a <- a + 1
and c <- c * 3
can be executed in parallel, but a <- a + 1
and
b <- b * 3
should be in sequential.
a <- mx.nd.ones(c(2,3))
b <- a
c <- mx.nd.copyto(a, mx.cpu())
a <- a + 1
b <- b * 3
c <- c * 3
Luckily, MXNet can automatically resolve the dependencies and execute operations in parallel accurately. This allows us to write our program assuming there is only a single thread. MXNet will automatically dispatch the program to multiple devices.
MXNet achieves this with lazy evaluation. Each operation is issued to an
internal engine, and then returned. For example, if we run a <- a + 1
, it
returns immediately after pushing the plus operator to the engine. This
asynchronous processing allows us to push more operators to the engine. It determines
the read and write dependencies and the best way to execute them in
parallel.
The actual computations are finished, allowing us to copy the results someplace else, such as as.array(a)
or mx.nd.save(a, "temp.dat")
. To write highly parallelized codes, we only need to postpone when we need
the results.
Can you improve this documentation? These fine people already did:
Qiang Kou (KK), Sandeep Krishnamurthy, Sheng Zha, thinksanky, Yao Wang, Jon & andremoellerEdit on GitHub
cljdoc is a website building & hosting documentation for Clojure/Script libraries
× close