has anybody tried using tablecloth to create a dataset from an xlsx file? The doc (assuming https://scicloj.github.io/tablecloth/ is it) is really scant on how to do so. At https://scicloj.github.io/tablecloth/#dataset-creation, it mentions how xlsx is supported (`file types: raw/gzipped csv/tsv, json, xls(x) taken from local file system or URL`), so I've been trying to (tc/dataset "path-to-xlsx-file") to no avail. Initially I had a multi-sheet xlsx... then I tried a single-sheet one, then I tried maybe perhaps putting the xlsx file in the same directory where Noj is running (that's how I am using tablecloth)... I still keep getting "Unrecognized read file type: xlsx"
I usually use the poi namespace from the underlying tmd library https://techascent.github.io/tech.ml.dataset/tech.v3.libs.poi.html
I've used tech.v3.libs.fastexcel successfully in the past. (And the name of that java library is accurate) If I remember correctly I did have to take a peek at the tmd wrapper code to figure out how to refer to exact sheets by name and things like that, but I could have misunderstood the API the wrapper exposes as well.
I might be misremembering but one of the other solutions I had tried, possibly poi, was loading too much stuff I didn't need in memory by default, where I was just after a particularly named sheet in a file.
> (And the name of that java library is accurate) as in it is fast? otherwise I'm not getting what you mean
You have to add additional libraries to make it possible.
https://techascent.github.io/tech.ml.dataset/tech.v3.libs.poi.html
Ooops. Sorry for the duplicate. I didn't realize that it's already answered
could somebody perhaps help me out with requiring fastexcel? I've got techascent/tech.ml.dataset {:mvn/version "7.067"} in my deps.edn as per https://clojars.org/techascent/tech.ml.dataset... and then I figured maybe [tech.v3.libs.fastexcel] in (:require) in my clj file (https://techascent.github.io/tech.ml.dataset/100-walkthrough.html has user> (require '[tech.v3.libs.fastexcel])) would work... but I'm getting issues
deps.edn
{:deps {...
techascent/tech.ml.dataset {:mvn/version "7.067"}}}
a.clj
(ns a
(:require [tablecloth.api :as tc]
[tech.v3.dataset :as ds]
[tech.v3.libs.fastexcel :as fe]))
Error:
Execution error (ClassNotFoundException) at java.net.URLClassLoader/findClass (URLClassLoader.java:445).
org.dhatim.fastexcel.reader.ReadableWorkbook
The org.dhatim in the error message reminds me of https://techascent.github.io/tech.ml.dataset/tech.v3.libs.fastexcel.html:
Required Dependencies:
[org.dhatim/fastexcel-reader "0.12.8" :exclusions [org.apache.poi/poi-ooxml]]
but I'm not sure what to make of it. If I try requiring [org.dhatim/fastexcel-reader :as fe] instead in my ns form, I get the following:
Syntax error macroexpanding clojure.core/ns at (a.clj:1:1).
((:require [tablecloth.api :as tc] [tech.v3.dataset :as ds] [org.dhatim/fastexcel-reader :as fe])) - failed: Extra input spec: :clojure.core.specs.alpha/ns-formre problem with requiring fastexcel: I think i have it now. Getting something like this working shouldnt be a goose chase like this but I found https://github.com/techascent/tech.ml.dataset/issues/405 which gave me a clue that perhaps fastexcel is a separate dependency. So putting that into deps.edn resolved it.
I am just noticing https://github.com/techascent/tech.ml.dataset/commit/b1cb8d058d085ae01e4c694695feb499bdcc2ba5 (via https://github.com/techascent/tech.ml.dataset/issues/283) where the comment is made that ... poi is more robust . Hm.
I use poi for most of my needs. Occasionally I'll get a spreadsheet so messy that I'll save what I want out to csv (I could possibly wrestle it down, but sometimes the hacky way is shorter and less error prone)
I also use https://techascent.github.io/tech.ml.dataset/tech.v3.libs.poi.html#var-input-.3Eworkbook input->workbook if some of the sheets are a mess, but the one I want is OK and I don't want to parse the messy ones
gotcha. Thanks for the tip!
sorry, a few more questions on poi:
1. do I need to declare any deps?
2. do I need any require in my ns form?
3. I just tried (ds/->dataset "test.xlsx"), and I am getting an error: Execution error at
you'll need poi in your deps
and you'll need to require the tmd poi NS
(sorry for the drive by help, I'm pulled in different directions today)
no problem. Youโre still putting in effort to help and I appreciate that!
(mostly for my own records, but good for whoever else might be having an issue) I had to have both the poi, and poi-ooxml in deps: deps.edn:
{:deps { ...
org.apache.poi/poi {:mvn/version "5.5.0"}
org.apache.poi/poi-ooxml {:mvn/version "5.5.0"}
...}}
code.clj:
(ns code
(:require ...
[tech.v3.dataset :as ds]
[tech.v3.libs.poi :as poi]
...))