Fork me on GitHub
#data-science
<
2019-10-21
>
sakalli06:10:03

Hi there! Here is a blog about some of the stuff re Clojure and data science that @daslu , @konrad.kuehne and myself discussed in Helsinki after ClojuTRE, kindly summarised by Daniel. We would love to hear your thoughts about this! https://scicloj.github.io/posts/2019-10-18-data-wishes/

๐Ÿ‘ 4
chrisn13:10:58

@neo2551 I have not tried pyodide. I would like to see the jvm implement wasm such that it performed as it should.

chrisn13:10:59

With Rust it is relatively easy to add a C-layer that we can use via JNA. Lucet may already have that layer in fact as it is designed to be embedded.

chrisn17:10:04

2. TVM now has an IR layer that supports autodifferentiation: https://docs.tvm.ai/dev/relay_intro.html

chrisn17:10:29

New version of TVM works with latest tech.datatype, tech.ml.dataset, etc.

pizzaspin 8
๐Ÿ’ฏ 4
โž• 4
David Pham17:10:51

What is TVM?

David Pham17:10:32

Never mind :)

chrisn17:10:34

It is my favorite toy ๐Ÿ™‚.

David Pham17:10:49

I just want to have pandas in CLJS

David Pham17:10:08

I canโ€™t find any library that provides the data alignment functionality

chrisn17:10:46

Is that the align function you want?

David Pham17:10:51

For timeseries: you want to concat two ts, but they might not have the same time stamps. So you would need to make an outer join

David Pham17:10:08

Yeah that one.

David Pham17:10:56

When you have a sorted index, I canโ€™t solve the problem with an efficient algorithm.

chrisn17:10:39

Do you mean you 'can' solve the problem with sorted index?

David Pham17:10:00

Well, what I am doing now is I am representing my timeseries as sorted-map

David Pham17:10:17

And have date->vector map

David Pham17:10:29

I get super fast slicing thanks to subseq

David Pham17:10:18

But this representation is inefficient for computation (so I would need to go for date->index and manage my slicing accordingly and keep the data as a matrix).

David Pham17:10:09

The biggest trick is whenever I have two timeseries that might potentially have different time index

chrisn17:10:29

But some of the times align and some don't

David Pham17:10:31

Then I would need to merge them (the issue with merge is I canโ€™t do outer join)

David Pham17:10:23

IMO, the biggest feature of pandas is to solve this problem extremely efficiently.

David Pham17:10:50

And all the handling of time as well.

David Pham17:10:10

tablesaw does not even care about that.

chrisn17:10:20

Tablesaw (and tech.ml.dataset) does not have the concept of an index across the table. You could create brand new tables that looked the same as the two tables in the documentation though, they would not share backing store data.

chrisn17:10:46

Views are doable in tech.ml.dataset but I would first get the functionality and tests working correctly for the 'align' function and then worry about views when someone runs out of RAM.

David Pham17:10:49

But then I would need it on CLJS xD

David Pham17:10:30

I am at the point of thinking to use tensorflow-js for performing my linear algebra operations.

David Pham17:10:52

You could rely on WebGL whenever available xD

chrisn17:10:40

lol, you are better off with tensorflow-js than waiting for me to port the tech platform to js. Why CLJS? Just for kicks?

David Pham17:10:28

My company forbid me to use Clojure (to import jar file more precisely) whereas they have now restrictions for JS files

jsa-aerial21:10:26

If it weren't for this (bizarre restriction) you could mix CLJS with things like Neanderthal and Panthera via Saite. Neanderthal is available now, and I am now looking to add Panthera. Then you could have your Pandas and Numpy as well from Cljs

๐Ÿ‘ 12
chrisn17:10:47

haha ๐Ÿ™‚

David Pham17:10:48

So I code my tools outside the company networks and download the JS files from Github

David Pham17:10:35

Plus my velocity in developing UI has been amazing the last year (I am UI newbee)

David Pham17:10:46

So they let me play with it xD

David Pham17:10:45

Actually, they have been so amazed that we probably are going to stick with ClojureScript for any official web interface

David Pham17:10:15

I hope I can hack Clojure in the backend soon. I want to play with Neanderthal xD

David Pham17:10:50

I also suspect that tfjs WebGL is faster than most of our internal data science tools (we use Matlab and R)

chrisn17:10:52

Maybe. The internal stuff should use system blas libraries so it depends on which system blas they have installed. They could have mkl installed or something like that.

David Pham17:10:44

Yeah, that would be tricky to beat

David Pham17:10:15

That being said I could mimic tech.ml.dataset with tfjs

๐Ÿ‘ 4
jsa-aerial21:10:26

If it weren't for this (bizarre restriction) you could mix CLJS with things like Neanderthal and Panthera via Saite. Neanderthal is available now, and I am now looking to add Panthera. Then you could have your Pandas and Numpy as well from Cljs

๐Ÿ‘ 12
David Pham21:10:15

You have my attention now :) Guess I have to check Saite now

David Pham21:10:23

One of my best shot would still be to force them use GraalVM for the R<->Java interop and hope I can sneak all my Clojure dependencies as well