Fork me on GitHub
#data-science
<
2024-04-17
>
emilaasa08:04:51

I'm loading a csv where one properties column is a json object (well, a string that can be parsed into a json object). In pandas I would use a combination of json.loads and pandas.json_normalize to get each top level key in the json as a new column. Can I do something similar with tablecloth ?

genmeblog08:04:30

I've never tried that but there is an option :parser-fn where you can pass a parser for specific column. Something like (tc/dataset csv-file {:parser-fn {"properties" your-json-parser}})

emilaasa09:04:25

Thanks @U1EP3BZ3Q - unfortunately that does not seem to do anything for me - but to clarify this is what I want: From:

(-> (tc/dataset [{:x 1, :y 2 :properties "{\"a\": 1, \"b\": 1}"}
                 {:x 2, :y 3 :properties "{\"a\": 2, \"b\": 2}"}]))


; | :x | :y |      :properties |
; |---:|---:|------------------|
; |  1 |  2 | {"a": 1, "b": 1} |
; |  2 |  3 | {"a": 2, "b": 2} |
; 
To:
; | :x | :y | :a | :b |
; |---:|---:|---:|---:|
; |  1 |  2 |  2 |  1 |
; |  2 |  3 |  2 |  2 |

genmeblog09:04:59

doable, wait a second (preparing an example)

genmeblog09:04:44

(require '[clojure.data.json :as json])

(def ds-with-json (tc/dataset [{:x 1, :y 2 :properties "{\"a\": 1, \"b\": 1}"}
                             {:x 2, :y 3 :properties "{\"a\": 2, \"b\": 2}"}]))


(->> (:properties ds-with-json)
     (map json/read-str)
     (tc/dataset)
     (tc/append (tc/drop-columns ds-with-json :properties)))

;; => _unnamed [2 4]:
;;    | :x | :y | a | b |
;;    |---:|---:|--:|--:|
;;    |  1 |  2 | 1 | 1 |
;;    |  2 |  3 | 2 | 2 |

genmeblog09:04:28

I think json/read-str can be passed to parse-fn in the ds creation from csv file like I described in my first comment.

emilaasa11:04:51

(-> (tc/dataset [{:x 1, :y 2 :properties "{\"a\": 1, \"b\": 1}"}
                 {:x 2, :y 3 :properties "{\"a\": 2, \"b\": 2}"}])
    (tc/write-csv! "/tmp/out.csv"))

(-> (tc/dataset "/tmp/out.csv"))
; /tmp/out.csv [2 3]:
; 
; | x | y |       properties |
; |--:|--:|------------------|
; | 1 | 2 | {"a": 1, "b": 1} |
; | 2 | 3 | {"a": 2, "b": 2} |
; 
(-> (tc/dataset "/tmp/out.csv"
            {:parser-fn {"properties" json/parse-string}}))
; /tmp/out.csv [2 3]:
; 
; | x | y |       properties |
; |--:|--:|------------------|
; | 1 | 2 | {"a": 1, "b": 1} |
; | 2 | 3 | {"a": 2, "b": 2} |

emilaasa11:04:03

If I follow the docstring in tc/dataset I can tweak it into this shape:

(-> (tc/dataset "/tmp/out.csv"
            {:parser-fn {"properties" [:map json/parse-string]}}))
; /tmp/out.csv [2 3]:
; 
; | x | y |     properties |
; |--:|--:|----------------|
; | 1 | 2 | {"a" 1, "b" 1} |
; | 2 | 3 | {"a" 2, "b" 2} |

genmeblog11:04:54

Great. Then apply the rest operations from my example.

emilaasa11:04:29

Thanks so much for the help!

👍 1
emilaasa11:04:13

For completeness, this works well for me:

(let [ds-with-json (tc/dataset "/tmp/out.csv"
                               {:key-fn clojurize
                                :parser-fn {:properties [:map #(json/parse-string % clojurize)]}})]
  (-> (:properties ds-with-json)
      (tc/dataset)
      (tc/append (tc/drop-columns ds-with-json :properties))))

genmeblog11:04:13

Looks good. For completeness there are also functions like unroll or separate-column to create columns from sequences or strings (this is not the case here, but...).

emilaasa11:04:44

This is a beautiful library overall, works so well!

genmeblog11:04:11

Happy to hear that! 😁