Skip to content

Instantly share code, notes, and snippets.

View fac2003's full-sized avatar

Fabien Campagne fac2003

  • New York, NY,USA
View GitHub Profile
while (iterator.hasNext() && shouldWork.get()) {
DataSet smth = null;
if (useWorkspace) {
try (MemoryWorkspace ws = workspace.notifyScopeEntered()) {
smth = iterator.next();
if (callback != null)
callback.call(smth);
}
package org.campagnelab.dl.framework.mixup;
import cern.jet.random.Beta;
import cern.jet.random.engine.RandomEngine;
import it.unimi.dsi.util.XorShift1024StarRandom;
import org.nd4j.linalg.api.ndarray.INDArray;
import org.nd4j.linalg.dataset.api.MultiDataSet;
import org.nd4j.linalg.dataset.api.MultiDataSetPreProcessor;
import org.nd4j.linalg.factory.Nd4j;
java -Xmx60g -cp target/dl4j-cuda-specials-0.4-rc0-SNAPSHOT-bin.jar org.deeplearning4j.examples.multigpu.MultiGpuLenetMnistExample
Peer access [0] -> [4] isn't possible
Peer access [0] -> [5] isn't possible
Peer access [0] -> [6] isn't possible
Peer access [0] -> [7] isn't possible
Peer access [1] -> [4] isn't possible
Peer access [1] -> [5] isn't possible
Peer access [1] -> [6] isn't possible
Peer access [1] -> [7] isn't possible
Peer access [2] -> [4] isn't possible
o.d.e.m.MultiGpuLenetMnistExample - Build model....
o.d.e.m.MultiGpuLenetMnistExample - Train model....
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:5907 code=77(<unknown>) "cudaStreamSynchronize(*stream)"
o.d.p.ParallelWrapper - Averaged score: 2.079001024173573
Exception in thread "main" java.lang.RuntimeException: Can't allocate [HOST] memory: 32; threadId: 1
at org.nd4j.jita.memory.impl.CudaDirectProvider.malloc(CudaDirectProvider.java:59)
at org.nd4j.jita.memory.impl.CudaCachingZeroProvider.malloc(CudaCachingZeroProvider.java:113)
at org.nd4j.jita.memory.impl.CudaFullCachingProvider.malloc(CudaFullCachingProvider.java:77)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:218)
at org.nd4j.jita.handler.impl.CudaZeroHandler.alloc(CudaZeroHandler.java:239)
checkout on 53f098b231f819f51a5109476b52c446567a1856
diff --git a/pom.xml b/pom.xml
index 532a3ad..ce08171 100644
--- a/pom.xml
+++ b/pom.xml
@@ -127,7 +127,8 @@
<dependency>
<groupId>org.nd4j</groupId>
[Stage 4:===> (2 + 30) / 32]CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:2831 code=77(<unknown>) "cudaStreamSynchronize(*stream)"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3921 code=77(<unknown>) "cudaStreamSynchronize(*pStream)"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3921 code=77(<unknown>) "cudaStreamSynchronize(*pStream)"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3947 code=77(<unknown>) "result"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3947 code=77(<unknown>) "result"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3921 code=77(<unknown>) "cudaStreamSynchronize(*pStream)"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3947 code=77(<unknown>) "result"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3921 code=77(<unknown>) "cudaStreamSynchronize(*pStream)"
CUDA error at /skymind/libnd4j/blas/cuda/NativeOps.cu:3947 code=77(<unknown>) "result"
CUDA error at /skymind/libnd
nvidia-smi
Mon Jul 11 13:53:00 2016
+------------------------------------------------------+
| NVIDIA-SMI 352.79 Driver Version: 352.79 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 Off | 0000:04:00.0 Off | Off |
| N/A 38C P0 59W / 149W | 207MiB / 12287MiB | 0% Default |
java -Xmx10g -cp target/dl4j-spark-cdh5-examples-1.0-SNAPSHOT.jar org.deeplearning4j.examples.rnn.GravesLSTMCharModellingExample 2>&1 |tee GPU-benchmark-0.4.0-3.txt
o.n.n.NativeOps - Number of threads used for linear algebra 32
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
head -1000 GPU-benchmark-0.4.0-2.txt
o.n.n.NativeOps - Number of threads used for linear algebra 32
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
@fac2003
fac2003 / gist:7d948f7687194e03a3f066bf0965ef99
Created July 11, 2016 14:06
running out of memory with dl4j multi-GPU
[mas2182@node007 dl4j-spark-cdh5-examples]$ java -Xmx10g -cp target/dl4j-spark-cdh5-examples-1.0-SNAPSHOT.jar org.deeplearning4j.examples.rnn.GravesLSTMCharModellingExample |tee GPU-benchmark-0.4.0-1.txt
o.n.n.NativeOps - Number of threads used for linear algebra 32
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...
o.n.j.a.c.i.BasicContextPool - Creating new stream for thread: [1], device: [0]...