This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-03-26
Channels
- # announcements (3)
- # architecture (53)
- # babashka (6)
- # beginners (101)
- # bitcoin (3)
- # calva (4)
- # cider (3)
- # clara (7)
- # cljdoc (2)
- # cljsrn (14)
- # clojure (104)
- # clojure-europe (96)
- # clojure-germany (21)
- # clojure-nl (6)
- # clojure-serbia (3)
- # clojure-spain (1)
- # clojure-uk (13)
- # clojuredesign-podcast (4)
- # clojurescript (14)
- # cursive (5)
- # data-science (19)
- # datomic (16)
- # emacs (15)
- # fulcro (33)
- # graalvm (5)
- # honeysql (3)
- # instaparse (2)
- # jobs (3)
- # lsp (82)
- # malli (2)
- # off-topic (11)
- # pedestal (4)
- # polylith (62)
- # practicalli (4)
- # shadow-cljs (56)
- # tools-deps (53)
- # vim (17)
- # xtdb (53)
👋 I have a question about
and hope this is the channel to use 😄 I can't figure out how to tell the ->dataset
to parse Excel date cells as dates (and not numerics). Which cells that are date cells is unknown when parsing, is there a way to default to dates? Currently I get a column with a numeric value (the Excel way of calculating dates: "days since 1900 1 jan").
Example - a column is recognized as :datatype :float64
with values like 44258.0
@UDRJMEFSN can you help here?
Could I get an example document? Do you have a safe, small one you can send? There is no 'date' type in POI and fastexcel's cell type. You can do it by supplying a custom parse function to convert from days-since-jan-1 to a local date type but I would want to see some data and I can get you a test parse line. Something like
where convert-to-local-date
is a function from a number to your date type. Be wary of nils or strings; excel can really put anything in any cell.
Thanks! I think I succeeded with doing that, there was a similar example in the readme of the repo, I think.
Your function can return :http://tech.ml.dataset.parse/missing or :http://tech.ml.dataset.parse/parse-failure to denote a missing value or a value you expect to parse but it failed. In that case the value will be in the metadata of the column.
If the cell containing date is unknown, is there a way to figure out what column is a date, in the meta data or something similar?
Aha, do you mean that my parse function could do that - if parsing failed -> not a date?
I haven't found a way to do that yet. I think there may be but it is through the POI and fastexcel API's.
Also, if you want to build/edit spreadsheets there is a new library I think is really nice named https://github.com/zero-one-group/fxl.
Thanks! I'll have a look!
In our scenario, we guess the contents of an excel file and present it in a UI as suggestion for later parsing.
I have solved the Excel date thing, by reading the file with ->dataset
as ususal. Then I take samples of columns from the dataset with type float64
to make an educated guess if it is a reasonable date, by using the Apache POI DateUtil/getLocalDateTime
function.
I collect the column names that are candidates for excel dates and build a parser-fn
map and run the ->dataset
again, with the parser functions.
There's an extra roundtrip, but could work 😄 Let's see what my colleagues think of it 😅
I like that approach honestly. It probably parses a file more or less instantly and you get the entire dataset to run your type heuristics on.
You can re-parse a column though: https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.column.html#var-parse-column Then you can just re-parse the columns using same syntax as your parser-fn.
I should have directed you towards https://techascent.github.io/tech.ml.dataset/tech.v3.dataset.html#var-column-cast - that is more general and not specific towards string columns.
I read about the column-cast and will give that one a try too. I think I may have misunderstood the api of it yesterday, but it looks like a good approach. Thank you!