This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-11-10
Channels
- # aleph (4)
- # aws (2)
- # bangalore-clj (2)
- # beginners (84)
- # boot (25)
- # cider (3)
- # cljsrn (3)
- # clojure (57)
- # clojure-italy (5)
- # clojure-losangeles (3)
- # clojure-russia (7)
- # clojure-spec (18)
- # clojure-uk (29)
- # clojurescript (90)
- # cursive (11)
- # data-science (68)
- # datascript (2)
- # datomic (25)
- # duct (3)
- # fulcro (13)
- # graphql (7)
- # immutant (1)
- # jobs (1)
- # leiningen (12)
- # lumo (1)
- # off-topic (51)
- # om (43)
- # onyx (15)
- # parinfer (10)
- # pedestal (4)
- # re-frame (7)
- # reagent (42)
- # ring-swagger (42)
- # rum (1)
- # shadow-cljs (172)
- # spacemacs (10)
- # specter (4)
- # sql (4)
- # test-check (19)
- # unrepl (54)
- # yada (3)
@gigasquid Thanks for the cats and dogs example, it looks very interesting
I tried running it but I get this:
> (train)
...
ExceptionInfo Batch size is not commensurate with epoch size clojure.core/ex-info (core.clj:4725)
cats-dogs-cortex-redux.core> *e
#error {
:cause "Batch size is not commensurate with epoch size"
:data {:epoch-size 21250, :batch-size 32}
:via
[{:type clojure.lang.ExceptionInfo
:message "Batch size is not commensurate with epoch size"
:data {:epoch-size 21250, :batch-size 32}
:at [clojure.core$ex_info invokeStatic "core.clj" 4725]}]
:trace
[[clojure.core$ex_info invokeStatic "core.clj" 4725]
[clojure.core$ex_info invoke "core.clj" 4725]
[cats_dogs_cortex_redux.core$train_ds invokeStatic "core.clj" 256]
an batch size of 50 results in a different error
wait, looks like my data folder is empty!
ignore me please, I’ll redo the first steps and get back to you, not sure what happened with the files 🙂
many thanks
@gigasquid made some progress (ran out memory) but calling (train)
as is, will always cause an exception because (=not 0 (rem 21250 32))
(the defaults!)
checked on line 254
I thought I had regened the uberjar and run it after I changed it, but obviously not
it’s your code that checks for this btw
yes - I took that bit directly from the resnet-retrain example in the cortex project
same exception because it’s not an exact division, but I could just define a batch-size of 21250, right?
#error {
:cause "Batch size is not commensurate with epoch size"
:data {:epoch-size 21250, :batch-size 21248}
:via
[{:type clojure.lang.ExceptionInfo
:message "Batch size is not commensurate with epoch size"
:data {:epoch-size 21250, :batch-size 21248}
:at [clojure.core$ex_info invokeStatic "core.clj" 4725]}]
:trace
oh sorry 🙂
yep, should have realized that’s what you meant
not sure if it’s significant that I’m running this on the CPU instead of CUDA, but I got this:
IllegalArgumentException No implementation of method: :->view-impl of protocol: #'think.datatype.base/PView found for class: clojure.lang.ExceptionInfo clojure.core/-cache-protocol-fn (core_deftype.clj:583)
cats-dogs-cortex-redux.core> *e
#error {
:cause "No implementation of method: :->view-impl of protocol: #'think.datatype.base/PView found for class: clojure.lang.ExceptionInfo"
:via
[{:type java.lang.RuntimeException
:message "Error during queued sequence execution:"
:at [think.parallel.core$queued_sequence$fn__33350 invoke "core.clj" 229]}
{:type java.lang.IllegalArgumentException
:message "No implementation of method: :->view-impl of protocol: #'think.datatype.base/PView found for class: clojure.lang.ExceptionInfo"
:at [clojure.core$_cache_protocol_fn invokeStatic "core_deftype.clj" 583]}]
:trace
[[clojure.core$_cache_protocol_fn invokeStatic "core_deftype.clj" 583]
[clojure.core$_cache_protocol_fn invoke "core_deftype.clj" 575]
[think.datatype.base$eval13577$fn__13578$G__13568__13587 invoke "base.cljc" 170]
[think.datatype.base$__GT_view invokeStatic "base.cljc" 180]
[think.datatype.base$__GT_view invokePrim "base.cljc" -1]
[think.datatype.base$__GT_view invokeStatic "base.cljc" 182]
[think.datatype.base$__GT_view invoke "base.cljc" 174]
[think.datatype.base$make_view invokeStatic "base.cljc" 187]
[think.datatype.base$make_view invoke "base.cljc" 185]
[think.datatype.core$make_view invokeStatic "core.clj" 56]
[think.datatype.core$make_view invoke "core.clj" 54]
[cortex.compute.cpu.driver$eval33762$fn__33767 invoke "driver.clj" 259]
[cortex.compute.driver$eval14282$fn__14318$G__14271__14327 invoke "driver.clj" 74]
[cortex.compute.driver$allocate_device_buffer invokeStatic "driver.clj" 159]
[cortex.compute.driver$allocate_device_buffer doInvoke "driver.clj" 156]
[clojure.lang.RestFn invoke "RestFn.java" 464]
[cortex.tensor$new_tensor invokeStatic "tensor.clj" 456]
[cortex.tensor$new_tensor doInvoke "tensor.clj" 448]
[clojure.lang.RestFn invoke "RestFn.java" 410]
[cats_dogs_cortex_redux.core$src_ds_item__GT_net_input invokeStatic "core.clj" 200]
[cats_dogs_cortex_redux.core$src_ds_item__GT_net_input invoke "core.clj" 170]
[clojure.lang.AFn applyToHelper "AFn.java" 154]
[clojure.lang.AFn applyTo "AFn.java" 144]
[clojure.core$apply invokeStatic "core.clj" 657]
[clojure.core$apply invoke "core.clj" 652]
[think.parallel.core$wrap_thread_bindings$fn__33318 doInvoke "core.clj" 120]
[clojure.lang.RestFn applyTo "RestFn.java" 137]
[clojure.core$apply invokeStatic "core.clj" 657]
[clojure.core$apply invoke "core.clj" 652]
[think.parallel.core$queued_sequence$process_fn__33336$fn__33337 invoke "core.clj" 215]
[think.parallel.core$queued_sequence$process_fn__33336 invoke "core.clj" 209]
[clojure.lang.AFn call "AFn.java" 18]
[java.util.concurrent.ForkJoinTask$AdaptedCallable exec "ForkJoinTask.java" 1424]
[java.util.concurrent.ForkJoinTask doExec "ForkJoinTask.java" 289]
[java.util.concurrent.ForkJoinPool$WorkQueue runTask "ForkJoinPool.java" 1056]
[java.util.concurrent.ForkJoinPool runWorker "ForkJoinPool.java" 1689]
[java.util.concurrent.ForkJoinWorkerThread run "ForkJoinWorkerThread.java" 157]]}
sorry for the huge paste
no, REPL
no problem!
there may be something wrong with my setup, but I get some weirdness with uberjar too:
➜ cats-dogs-cortex-redux lein uberjar
Warning: specified :main without including it in :aot.
Implicit AOT of :main will be removed in Leiningen 3.0.0.
If you only need AOT for your uberjar, consider adding :aot :all into your
:uberjar profile instead.
Compiling cats-dogs-cortex-redux.core
Nov 10, 2017 6:30:44 PM com.github.fommil.jni.JniLoader liberalLoad
INFO: successfully loaded /var/folders/f_/0_rfxz496k520hd35q8v68940000gn/T/jniloader2330751718951129222netlib-native_system-osx-x86_64.jnilib
Reflection warning, cognitect/transit.clj:142:19 - call to static method writer on com.cognitect.transit.TransitFactory can't be resolved (argument types: unknown, java.io.OutputStream, unknown).
Created /Users/sideris/devel/cats-dogs-cortex-redux/target/cats-dogs-cortex-redux-0.1.0-SNAPSHOT.jar
Created /Users/sideris/devel/cats-dogs-cortex-redux/target/cats-dogs-cortex-redux.jar
➜ cats-dogs-cortex-redux java -jar target/cats-dogs-cortex-redux-0.1.0-SNAPSHOT.jar
Exception in thread "main" java.lang.NoClassDefFoundError: clojure/lang/Var
at cats_dogs_cortex_redux.core.<clinit>(Unknown Source)
Caused by: java.lang.ClassNotFoundException: clojure.lang.Var
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:331)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
... 1 more
ok, pulled and running again
I don’t run out of memory but I get the IllegalArgumentException No implementation of method: :->view-impl of protocol: #'think.datatype.base/PView
exception again
did you run it with with the CUDA backend?
because in my case I see this:
training using batch size of 32
CUDA backend creation failed, reverting to CPU
IllegalArgumentException No implementation of method: :->view-impl of protocol: #'think.datatype.base/PView
is one I think happens when there are mem problems ..
the stacktrace contains this line which makes me think that it may be CPU-specific:
[cortex.compute.cpu.driver$eval33762$fn__33767 invoke "driver.clj" 259]
As an experiment you might want to try decreasing the batch size and see if it helps
oh ok, I’m brand new to all of this (your blog motivated me to look into it!)
ok, 128 it is!
also you could check out the MNIST example in the cortex project and make sure you can run that ok first
If you are serious about doing some big data stuff - you can try getting a AWS P2 compute instance with nvidia
trying with 8. Yeah that would be a good sanity check for my setup I guess
that’s the recommended setup in that course you linked
batch size 8 results in the same problem, but I’m setting up a local machine with a decent NVIDIA GPU, so I’ll see if that works better and maybe even try the P2 compute instance at some point
in any case, thanks for making all of this a bit more approachable!
I just tried it without cuda and got the same error when trying to allocate the device buffer for the layers. My guess is that the RESNET50 network is just to big too do as cpu only
Thanks a lot for trying! So we know it’s not just me :) I’ll give it a try on my desktop GPU after I install windows and let you know how it goes