This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-08-07
Channels
- # aleph (7)
- # announcements (1)
- # babashka (31)
- # beginners (18)
- # calva (23)
- # cljdoc (10)
- # clojure (74)
- # clojure-europe (42)
- # clojure-norway (10)
- # clojure-uk (1)
- # clojurescript (19)
- # core-async (2)
- # cursive (188)
- # data-science (11)
- # datahike (1)
- # datascript (4)
- # events (2)
- # figwheel-main (23)
- # fulcro (5)
- # gratitude (2)
- # honeysql (3)
- # hyperfiddle (120)
- # jobs (3)
- # lsp (3)
- # meander (6)
- # missionary (8)
- # nrepl (1)
- # off-topic (5)
- # rdf (11)
- # releases (4)
- # remote-jobs (1)
- # sci (5)
- # tools-build (3)
- # tools-deps (14)
The dataset I am using happens to represent missing values as ""
. I’d like to treat these values as missing. Is there a way to do this? What I’m currently attempting is to use row-map, changing the value to :tech.ml.dataset.parse/missing
when it is ""
. I don’t believe this is doing what I intend, however.
Hmmm... when I create dataset by hand ""
is treated as missing
(def ds (tc/dataset {:a ["a" "" " " "b"]}))
ds
;; => _unnamed [4 1]:
;; | :a |
;; |----|
;; | a |
;; | |
;; | |
;; | b |
(tc/info ds)
;; => _unnamed: descriptive-stats [1 7]:
;; | :col-name | :datatype | :n-valid | :n-missing | :mode | :first | :last |
;; |-----------|-----------|---------:|-----------:|-------|--------|-------|
;; | :a | :string | 3 | 1 | a | a | b |
(tc/replace-missing ds :a :value "it was a missing value")
;; => _unnamed [4 1]:
;; | :a |
;; |------------------------|
;; | a |
;; | it was a missing value |
;; | |
;; | b |
Oh, probably it's a different path then. @UDRJMEFSN can you look at this?
If I pass a special parse-method for the column, it will mark it as missing. e.g.,
(ds/->dataset "data.parquet"
{:parser-fn
{"product_to_region_code" [:string (fn [s]
(if (str/blank? s)
:tech.v3.dataset/missing
s))]}})
For parquet we use the files missing indicators and it isn’t parsed like a csv. Is it acceptable to use column-map after load and return nil if the string is “”?
One thing that’s still a bit frustrating is even with the columns marking ""
as missing, missing values are pass to row-map
as nil
rather than the column’s key eliding from the row map entirely. Any idea if there’s a way to have the row passed to map-fn
elide columns whose value is missing?