Hello 👋
I have a question as a tech.v3.dataset novice.
I am importing a csv dataset where one of the column is a integer representing a unix timestamp with nanosecond precision.
How do I convert this into a date or an instant upon loading the dataset?
I tried
(d/->dataset "myfile.csv" {:parser-fn {"start" :epoch-nanoseconds}})
however, I get this error:
; Execution error at ham_fisted.Casts/longCast (Casts.java:85).
; Object cannot be casted to long: 1740459600000000000
I tried also with :instant, then I get
; Execution error at tech.v3.dataset.io.column_parsers.FixedTypeParser/addValue (column_parsers.clj:233).
; Failed to parse value 1740459600000000000 as datatype :instant on row 0
Here is a sample of the file:
exp,start
ABC,1738558800000000000
Thanks in advance for your help!First thoughts:
user> (require '[tech.v3.dataset :as ds])
nil
user> (slurp "t.csv")
"exp,start\nABC,1738558800000000000"
user> (def ds (ds/->dataset "t.csv"))
#'user/ds
user> ds
t.csv [1 2]:
| exp | start |
|-----|--------------------:|
| ABC | 1738558800000000000 |
user> (map meta (ds/columns ds))
({:categorical? true, :name "exp", :datatype :string, :n-elems 1}
{:name "start", :datatype :int64, :n-elems 1})
user> (ds/row-map ds (fn [{:strs [start]}]
(let [i (java.time.Instant/ofEpochSecond (/ start 1000000000)
(rem start 1000000000))]
{"inst" i
"date" (java.util.Date/from i)})))
t.csv [1 4]:
| exp | start | inst | date |
|-----|--------------------:|----------------------|------------------------------|
| ABC | 1738558800000000000 | 2025-02-03T05:00:00Z | Sun Feb 02 22:00:00 MST 2025 |Thanks Harold!
Since the column is read natively as an :int64 , I was hoping to be able to leverage one of the packing formats to avoid actually touching the data. Not sure if that's possible.
You're welcome. My gut is that'd be very unlikely to be worth it. More flexible this way, and if performance became a concern (e.g., you have 10B+ rows of this every day) then there'd be bigger wins elsewhere (switching serialization formats, maybe).