Fork me on GitHub
#data-science
<
2019-05-31
>
metasoarous04:05:48

^ FWIW, We did this in semantic-csv, and it seems to work well. One ns for transducer versions, one for the classic functions (the lib was pre transducers after all).

otfrom10:05:02

I've been having a lot of fun with cljplot lately

👏 4
genmeblog21:05:51

Cool 🙂 I'll resume a work on cljplot soon.

otfrom11:06:54

@tsulej happy to pitch in as well as I bump into bits that I need to change/fix. I'll post what I've been doing with them as well in the next few weeks. It is mostly copypasta from your examples, but I've done some tweaks around handling palettes and keeping them consistent across multiple plots.

otfrom11:06:20

@tsulej thx for actually building the thing. It really has been fun to use. 😄

👍 4
genmeblog11:06:23

Cool, happy to incorporate any changes and tweaks you found more comfortable.

genmeblog16:06:15

@U0525KG62 continuing, could you share how you've cleaned palettes/color configurations?

otfrom16:06:40

just a tick. lemme clean something up...

otfrom16:06:11

@tsulej

(def population-colors
    (merge
     (zipmap ["A1" "A2" "A3" "A4"]
             (reverse (color/palette-presets :purples-9)))
     (zipmap ["B1" "B2" "C1" "C2" "C3" "C4" "C5" "C6"]
             (reverse (color/palette-presets :blues-9)))
     (zipmap ["D1" "E1" "E2" "F1" "F1" "G1" "G2" "G3" "H1" "H2" "H3" "I4" "J1"]
             (into (color/palette-presets :green-orange-12) (reverse (color/palette-presets :reds-3))))))

  ;; turn vector of {["date" "population" "A1" "A2"...]} into right
  ;; shape for stacking area and line charts
  (defn stacked-area-data [xf maps]
    (into (sorted-map)
          (comp (map clean-record)
                (mapcat (fn [r]
                          (let [ks (remove #(= "date" %) (keys r))
                                fs (map (fn [k]
                                          (fn [m]
                                            {:date (get m "date")
                                             :category k
                                             :value (get m k)})) ks)]
                            ((apply juxt fs) r))))
                (x/by-key :category (x/into []))
                (map (fn [[k v]] [k  (->> v
                                          (map (fn [m] (dissoc m :category)))
                                          (map (juxt :date :value))
                                          (sort-by :date)
                                          vec)]))
                xf)
          maps))

  (defn stacked-area
    ([title domain colors data]
     (let [palette (map colors domain)
           legend-spec (reverse (map
                                 (fn [d p]
                                   [:rect d {:color p}])
                                 domain palette))]
       (-> (plotb/series [:grid]
                         [:sarea data {:palette palette}])
           (plotb/preprocess-series)
           (plotb/update-scale :y :fmt int)
           (plotb/add-label :top title {:font-size 24 :font "Open Sans Bold" :margin 36})
           (plotb/add-axes :bottom {:ticks {:text-angle 45 :text-align :left :font "Open Sans" :font-size 12}})
           (plotb/add-axes :left {:ticks {:font "Open Sans" :font-size 12}})
           (plotb/add-label :bottom "Date" {:font-size 24 :margin 48 :font "Open Sans"})
           (plotb/add-label :left "Population" {:font-size 24 :margin 36 :font "Open Sans"})
           (plotb/add-legend "" legend-spec)
           (plotr/render-lattice {:width 1024 :height 768 :background :white}))))
    ([title colors data]
     (stacked-area title (-> data keys) colors data)))

  (plot/show (stacked-area "Historic Analysis of A and B codes - absolute numbers"
                           population-colors
                           (stacked-area-data (filter (fn [[k _]] (#{"A1" "A2" "B1" "B2"} k)))
                                              (csv->maps "./data/population_counts.csv"))))

otfrom16:06:33

all very first-ish draft really, but got the job done

otfrom16:06:07

most of the code ripped off from and tweaked from your examples. The rest is because maps and seqs are good. 🙂

otfrom14:06:17

@tsulej looking at my upcoming work - 2 things I'll have to figure out how to do are categorical heatmaps (basically showing co-occurrence of a with a, a with b, etc) and Sankey charts. I can use my exsiting ways of doing them in the meantime.

genmeblog14:06:18

categorical heatmaps are not done yet. Sankey... this is not easy one. I left it for later. Probably parallel plot will be earlier.

otfrom14:06:57

@tsulej yeah, I know those aren't there atm. Like I said. I've got other ways of doing them atm, but that is what is holding me back from doing it all in cljplot. Everything else has been great so far tho. thx!

genmeblog14:06:30

ah, ok 🙂 categorical x categorical plot is at the very beginning of my pipeline.

otfrom14:06:51

that would be super helpful, and might mean that I can get away w/o needing the sankey as I don't need multiple stages for my datavis, and I can use a lattice if I do.

stathissideris07:06:28

what’s the motivation for cljplot? What’s the advantage over solutions that bridge clojure with vega(-lite)?

stathissideris07:06:07

like Oz for example

stathissideris07:06:11

oh, sorry, I just saw the “why?” section

otfrom08:06:56

Yeah, I need something that does static charts in batches well

otfrom16:06:11

@tsulej

(def population-colors
    (merge
     (zipmap ["A1" "A2" "A3" "A4"]
             (reverse (color/palette-presets :purples-9)))
     (zipmap ["B1" "B2" "C1" "C2" "C3" "C4" "C5" "C6"]
             (reverse (color/palette-presets :blues-9)))
     (zipmap ["D1" "E1" "E2" "F1" "F1" "G1" "G2" "G3" "H1" "H2" "H3" "I4" "J1"]
             (into (color/palette-presets :green-orange-12) (reverse (color/palette-presets :reds-3))))))

  ;; turn vector of {["date" "population" "A1" "A2"...]} into right
  ;; shape for stacking area and line charts
  (defn stacked-area-data [xf maps]
    (into (sorted-map)
          (comp (map clean-record)
                (mapcat (fn [r]
                          (let [ks (remove #(= "date" %) (keys r))
                                fs (map (fn [k]
                                          (fn [m]
                                            {:date (get m "date")
                                             :category k
                                             :value (get m k)})) ks)]
                            ((apply juxt fs) r))))
                (x/by-key :category (x/into []))
                (map (fn [[k v]] [k  (->> v
                                          (map (fn [m] (dissoc m :category)))
                                          (map (juxt :date :value))
                                          (sort-by :date)
                                          vec)]))
                xf)
          maps))

  (defn stacked-area
    ([title domain colors data]
     (let [palette (map colors domain)
           legend-spec (reverse (map
                                 (fn [d p]
                                   [:rect d {:color p}])
                                 domain palette))]
       (-> (plotb/series [:grid]
                         [:sarea data {:palette palette}])
           (plotb/preprocess-series)
           (plotb/update-scale :y :fmt int)
           (plotb/add-label :top title {:font-size 24 :font "Open Sans Bold" :margin 36})
           (plotb/add-axes :bottom {:ticks {:text-angle 45 :text-align :left :font "Open Sans" :font-size 12}})
           (plotb/add-axes :left {:ticks {:font "Open Sans" :font-size 12}})
           (plotb/add-label :bottom "Date" {:font-size 24 :margin 48 :font "Open Sans"})
           (plotb/add-label :left "Population" {:font-size 24 :margin 36 :font "Open Sans"})
           (plotb/add-legend "" legend-spec)
           (plotr/render-lattice {:width 1024 :height 768 :background :white}))))
    ([title colors data]
     (stacked-area title (-> data keys) colors data)))

  (plot/show (stacked-area "Historic Analysis of A and B codes - absolute numbers"
                           population-colors
                           (stacked-area-data (filter (fn [[k _]] (#{"A1" "A2" "B1" "B2"} k)))
                                              (csv->maps "./data/population_counts.csv"))))

otfrom16:06:33

all very first-ish draft really, but got the job done

otfrom16:06:07

most of the code ripped off from and tweaked from your examples. The rest is because maps and seqs are good. 🙂