Fork me on GitHub
David Vujic16:03:54

👋 I have a question about .dataset and hope this is the channel to use 😄 I can't figure out how to tell the ->dataset to parse Excel date cells as dates (and not numerics). Which cells that are date cells is unknown when parsing, is there a way to default to dates? Currently I get a column with a numeric value (the Excel way of calculating dates: "days since 1900 1 jan").

David Vujic16:03:41

Example - a column is recognized as :datatype :float64 with values like 44258.0


@UDRJMEFSN can you help here?


Could I get an example document? Do you have a safe, small one you can send? There is no 'date' type in POI and fastexcel's cell type. You can do it by supplying a custom parse function to convert from days-since-jan-1 to a local date type but I would want to see some data and I can get you a test parse line. Something like


(->dataset "data.xls" {:parser-fn {"date" [:local-date convert-to-local-date]}})


where convert-to-local-date is a function from a number to your date type. Be wary of nils or strings; excel can really put anything in any cell.

David Vujic17:03:06

Thanks! I think I succeeded with doing that, there was a similar example in the readme of the repo, I think.


Your function can return : or : to denote a missing value or a value you expect to parse but it failed. In that case the value will be in the metadata of the column.


OK, great 🙂.

David Vujic17:03:15

If the cell containing date is unknown, is there a way to figure out what column is a date, in the meta data or something similar?

David Vujic17:03:51

Aha, do you mean that my parse function could do that - if parsing failed -> not a date?


I haven't found a way to do that yet. I think there may be but it is through the POI and fastexcel API's.


Also, if you want to build/edit spreadsheets there is a new library I think is really nice named

👍 3
David Vujic17:03:21

Thanks! I'll have a look!

David Vujic12:03:03

In our scenario, we guess the contents of an excel file and present it in a UI as suggestion for later parsing. I have solved the Excel date thing, by reading the file with ->dataset as ususal. Then I take samples of columns from the dataset with type float64 to make an educated guess if it is a reasonable date, by using the Apache POI DateUtil/getLocalDateTime function. I collect the column names that are candidates for excel dates and build a parser-fn map and run the ->dataset again, with the parser functions. There's an extra roundtrip, but could work 😄 Let's see what my colleagues think of it 😅


I like that approach honestly. It probably parses a file more or less instantly and you get the entire dataset to run your type heuristics on.


You can re-parse a column though: Then you can just re-parse the columns using same syntax as your parser-fn.


I should have directed you towards - that is more general and not specific towards string columns.

David Vujic13:03:29

I read about the column-cast and will give that one a try too. I think I may have misunderstood the api of it yesterday, but it looks like a good approach. Thank you!

👍 3