Fork me on GitHub
#data-science
<
2015-11-10
>
mikera03:11:55

Hey dtsao gyptis looks awesome, thanks for sharing!

mikera03:11:35

I have one constructive criticism: I think the use of dynamic vars / binding is a bad idea

mikera03:11:07

Better to use option maps IMHO

mikera04:11:42

I.e. make such parameters explicit and functional rather than pulling them from a mutable environment

dtsao04:11:32

thanks for the comment @mikera! Yes, I was debating this issue with myself; I think you're right that explicit optional parameters is a lot friendlier

mikera04:11:58

Yeah, stuff like testing, functional composition etc. becomes much easier

dtsao04:11:25

Right. I do think there might be a place for dynamic vars if the option maps become too unwieldy. (Gyptis isn't at that place yet).

dtsao04:11:14

A lot of plotting libraries have >20 parameters like font-size, color, etc. that can be customized.

dtsao04:11:06

Applying a 'theme' could be a bit easier to manage

dtsao04:11:39

So are you envisioning an api that looks more like: (dodged-bar [{:a 1 :b 10} {:a 2 :b 20}] :x :a :y :b) ?

mikera04:11:31

I prefer passing a map rather than keyword parameters. Keyword parameters get unweildy quite quick, and it is nice to use things like merge to build themes etc.

mikera04:11:26

Hmmmm looking at your syntax I think you could make use of some of teh core.matrix dataset functionality as well

dtsao04:11:20

ah you're the author for that, right?

dtsao04:11:04

i used core.matrix recently for a quick-and-dirty numerical simulation I was doing. it was really awesome! thanks!

dtsao04:11:07

I considered using core.matrix's dataset or the dataset in incanter. As an experiment, I wanted to see if it using SQL to perform all aggregations, statistics, joining, and sorting was easier.

mikera04:11:16

yup that's right. Jeff Rose is working on a ClojureScript port too

mikera04:11:44

Incanter is going to just use the core.matrix dataset functionality in the future

mikera04:11:59

So here's the "Big Idea": if we can get everyone to adopt the core.matrix abstractions, we can get very good composability across different numerical / data science libraries. Because it is all protocol backed, any front end library would be able to use any back-end matrix implementation etc.

dtsao04:11:49

that would be wonderful

dtsao04:11:02

Ideally, gyptis will be as agnostic as possible to whether the data is represented as a core.matrix dataset or as a vector of hashmaps

dtsao04:11:28

However, one of the main goals I have for gyptis is to be as thin a layer over vega as possible. And, vega being a js library, data representations that are closer to JSON are preferable.

mikera04:11:10

Well a core.matrix dataset is pretty much analogous to a 2D array with labelled columns (in future we may support labelled rows as well)

mikera04:11:30

Which is isomorphic with a vector of keyword-labelled maps

dtsao04:11:24

Yep, I made sure that gyptis operates fine with 2D arrays (which I should really write a testcase for...eek).

dtsao04:11:56

Namely, by using get instead of assuming that I could use a keyword as an accessor function

dtsao04:11:34

I do have to make sure that vega is able to work with a 2D array as well. If not, gyptis will have to convert the dataset internally to hashmaps in order to support core.matrix dataset

dtsao04:11:54

Anyway, I think you're right that it's important for any plotting lib to support the language's de-facto 'dataset' data structure....I'll definitely be looking into this.

mikera04:11:05

You can use functions like c.c.m/mget and c.c.m/slices... which should do the right things whatever array representation you have e.g. a 2D array, a dataset, a vectorz matrix, e.g.

mikera04:11:54

also clojure.core.matrix.dataset/dataset should be able to coerce any of these types into an appropriate dataset representation

dtsao05:11:00

oh that's awesome!

mikera05:11:32

` (dataset [{:a 1 😛 2} {:a 2 😛 4}])

mikera05:11:52

WTF smileys....

mikera05:11:13

(dataset [{:a 1 :b 2} {:a 2 :b 4}])

mikera05:11:24

=> {:column-names [:a :b], :columns [[1 2] [2 4]]}

mikera05:11:22

i.e. the last thing is a dataset with column storage (more efficient than nested maps if you have a lot of data)

mikera05:11:04

same thing with unlabelled 2D arrays

dtsao05:11:06

ok i'm pretty sold 😃

mikera05:11:09

(dataset [[1 2] [3 4]])

mikera05:11:24

=> {:column-names [0 1], :columns [[1 3] [2 4]]}

mikera05:11:21

Disclaimer: this all needs a bit of testing, not sure how much people have used this in anger yet

dtsao05:11:02

Just as an aside, (not trying to be argumentative, this is just something I've been thinking about), usually if performance of data is an issue, you are doing plotting wrong. The human eye can't distinguish more than ~100,000 objects on a screen anyway, so you are very likely overplotting.

mikera05:11:48

True if it is charts! Not so true if you are doing other types of visualisation (point clouds, geospatial renderings etc.)

dtsao05:11:20

oh i havent done much with geospatial renderings

dtsao05:11:54

with point clouds---i would estimate the upper bound to be maybe 1E7 or 1E8?

dtsao05:11:13

1 million pixels, with 10 or 100 gradations of color/intensity per pixel

dtsao05:11:25

(sorry I trained as a physicist so I enjoy these types of calculations)

mikera05:11:07

I guess... though normally if I was doing something that big I would pre-process into a bitmap or something first. This is what I did with a singapore mobile data visualisation: https://github.com/mikera/singa-viz

mikera05:11:26

I think that was in the 1e8 order of magnitude for the raw data

dtsao05:11:56

that's cool!

dtsao05:11:52

This is making me want to stress test my browser to see how many points it can handle in a canvas element

dtsao05:11:42

I should get going, but thanks so much for the helpful suggestions!

mikera05:11:39

No worries @dtsao great to see some new innovation in the Clojure visualisation space!

aaelony18:11:27

curious what folks think about TensorFlow (https://github.com/tensorflow/tensorflow) that Google open-sourced yesterday.... Would be nice to access from Clojure instead of C++/Python ....

mikera22:11:50

It is pretty cool. I've actually been working on something a bit similar for Clojure