Fork me on GitHub
#data-science
<
2022-06-06
>
Bhougland12:06:06

I am trying to create datasets from a xlsx workbook and am running into this error: Execution error (IllegalArgumentException) at tech.v3.libs.poi$wrap_cell$reify__16610/value (poi.clj:69). No matching field found: getLocalDateTimeCellValue for class org.apache.poi.xssf.usermodel.XSSFCell

Bhougland12:06:54

Sorry, I am using the tech.ml.dataset library and using the workbook->datasets function

chrisn13:06:50

Hmm - What version of POI?

Bhougland13:06:22

Your question got me thinking that maybe there is a dependency issue because I had some other libraries that use POI included. I removed those and ran the function again and I got the following error: Execution error (NullPointerException) at http://tech.v3.dataset.io.spreadsheet/sheet->dataset (spreadsheet.clj:59). null

Bhougland13:06:01

The only library I have included in this test project is tech.ml.dataset "6.087"

Bhougland13:06:01

The workbook has around 8 worksheets (tabs)

jumar13:06:37

It reminds me the https://clojurians.slack.com/archives/C053AK3F9/p1654262783523279 Anyway, you should be able to see the whole stracktrace to locate the exact source of the NPE.

Bhougland14:06:35

Execution error (NullPointerException) at tech.v3.dataset.io.spreadsheet/sheet->dataset (spreadsheet.clj:59).
null
Syntax error macroexpanding at (core.clj:1:149).
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3707)
java.lang.NullPointerException
	at clojure.lang.RT.longCast(RT.java:1284)
	at tech.v3.dataset.io.spreadsheet$sheet__GT_dataset.invokeStatic(spreadsheet.clj:59)
	at tech.v3.dataset.io.spreadsheet$sheet__GT_dataset.invoke(spreadsheet.clj:14)
	at tech.v3.libs.poi$workbook__GT_datasets$fn__16636.invoke(poi.clj:153)
	at clojure.core$mapv$fn__8445.invoke(core.clj:6912)
	at clojure.core.protocols$iter_reduce.invokeStatic(protocols.clj:49)
	at clojure.core.protocols$fn__8140.invokeStatic(protocols.clj:75)
	at clojure.core.protocols$fn__8140.invoke(protocols.clj:75)
	at clojure.core.protocols$fn__8088$G__8083__8101.invoke(protocols.clj:13)
	at clojure.core$reduce.invokeStatic(core.clj:6828)
	at clojure.core$mapv.invokeStatic(core.clj:6903)
	at clojure.core$mapv.invoke(core.clj:6903)
	at tech.v3.libs.poi$workbook__GT_datasets.invokeStatic(poi.clj:153)
	at tech.v3.libs.poi$workbook__GT_datasets.invoke(poi.clj:138)
	at clojure.lang.AFn.applyToHelper(AFn.java:156)
	at clojure.lang.AFn.applyTo(AFn.java:144)
	at clojure.lang.Compiler$InvokeExpr.eval(Compiler.java:3702)
	at clojure.lang.Compiler$DefExpr.eval(Compiler.java:457)
	at clojure.lang.Compiler.eval(Compiler.java:7182)
	at clojure.lang.Compiler.eval(Compiler.java:7132)
	at clojure.core$eval.invokeStatic(core.clj:3214)
	at clojure.core$eval.invoke(core.clj:3210)
	at clojure.main$repl$read_eval_print__9086$fn__9089.invoke(main.clj:437)
	at clojure.main$repl$read_eval_print__9086.invoke(main.clj:437)
	at clojure.main$repl$fn__9095.invoke(main.clj:458)
	at clojure.main$repl.invokeStatic(main.clj:458)
	at clojure.main$repl.doInvoke(main.clj:368)
	at clojure.lang.RestFn.invoke(RestFn.java:1523)
	at nrepl.middleware.interruptible_eval$evaluate.invokeStatic(interruptible_eval.clj:79)
	at nrepl.middleware.interruptible_eval$evaluate.invoke(interruptible_eval.clj:55)
	at nrepl.middleware.interruptible_eval$interruptible_eval$fn__17428$fn__17432.invoke(interruptible_eval.clj:142)
	at clojure.lang.AFn.run(AFn.java:22)
	at nrepl.middleware.session$session_exec$main_loop__17529$fn__17533.invoke(session.clj:171)
	at nrepl.middleware.session$session_exec$main_loop__17529.invoke(session.clj:170)
	at clojure.lang.AFn.run(AFn.java:22)
	at java.base/java.lang.Thread.run(Thread.java:829)

Bhougland17:06:33

So, the first error was on a windows machine. I did not receive this error using my linux machine. However, I did run into a Java Heap space error (probably because of the size of the worksheets). Any way around this?

jumar17:06:04

If it's legitimate then increase max heap aize via -Xmx

chrisn22:06:57

Can you file an issue with a very small spreadsheet attached that exhibits the load failure? The OOM issue is next - the dataset representation is far smaller than the spreadsheet so one thought I have is process each sheet one at a time making sure to release any references to the workbook after you creat the sequence of datasets

Bhougland14:06:44

Sure, can provide two versions of the spreadsheet. One version will have an excel addin that will probably give you lot of login prompts; this is because the data is being pulled from an API using this addin. The other won't have the addin activated. I just want to make sure I provide you a realistic example of my environment.

chrisn15:06:03

That addin - does it work with poi?