Fork me on GitHub
#data-science
<
2020-04-30
>
aaelony21:04:00

maybe a github repo that folks could commit/push links to current ds projects might be in order? or any other mechanism. A long, long, long time ago clojure toolbox was just this kind of thing. I'm not sure if it still is updated (https://www.clojure-toolbox.com/)

phronmophobic21:04:24

it’s still being updated. I had a repo added within the last month

👍 8
aaelony21:04:03

I don't see DS categories there though. Perhaps that's one idea

aaelony21:04:08

e.g. Python Integration lists pickler but no libpython-clj

phronmophobic21:04:37

those seem like good additions

aaelony21:04:17

I don't see http://tech.ml.dataset either

phronmophobic21:04:25

I was able to get my library added by making a pull request, https://github.com/weavejester/clojure-toolbox.com/pulls

aaelony21:04:56

perhaps the word can get out to the lib creators to register there if desired

phronmophobic21:04:05

I think anyone can make a pull request

aaelony21:04:06

Oz, Saite, etc..

phronmophobic21:04:28

Oz is under data exploration

metasoarous21:04:05

I did put together this a long time ago; Idea was supposed to be the same as far as PRing libraries: https://github.com/metasoarous/clojure-datascience

aaelony21:04:07

well, it used to be the case that a few libs fell under >1 category

metasoarous21:04:44

Yeah, it would be nice to have a little database of these things, which can be submitted and approved dynamically

aaelony21:04:54

maybe that's cleaned up, but still possible to have multiple uses

metasoarous21:04:56

Searchable UI; blahblah

aaelony21:04:14

yep, exactly that, metasoarous

aaelony21:04:50

BERT enabled search would be awesome 😉

🙂 4
Daniel Slutsky22:04:58

@U0CDMAKD0 @U05100J3V @U7RJTCH6J Hi. :) We maintain some relevant lists at the scicloj website. https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/libraries.md https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/reading.md https://github.com/scicloj/scicloj/blob/master/resources/templates/md/pages/chat_streams.md If anyone wants to have push permissions -- please tell. Our current method is that people can push changes to a draft branch, and one person (the "editor") merges them to master and tidies up. Any thoughts?

parrot 4
Daniel Slutsky22:04:11

@U0CDMAKD0 @U05100J3V Searchable UI is a great idea. We have been thinking for some time about migrating the website from Cryogen to something hiccup-driven such as Oz. Then (I think) it will be more fun to create some interactive views.

metasoarous22:04:59

Yes! That would be awesome. I've wanted to do that; Build viz tools for library discovery/evaluation.

👍 4
Daniel Slutsky22:04:20

On a related thread, @U3X7174KS is exploring some ideas of knowledge organization using Roam, that should be exportable to some comfortable data format.

metasoarous22:04:13

Roam seems to be everywhere these days doesn't it...

Daniel Slutsky22:04:34

Your tweet about it was inspiring.

Daniel Slutsky22:04:50

Regarding the scicloj website -- it gets too little attention these days. If anyone wants to take over and make it fresh and beautiful, that would be more than welcome of course. 🙃

phronmophobic22:04:56

fyi, the links to vega and vega lite in libraries.md are broken

phronmophobic22:04:59

is there a library for exploring a giant edn blob? like if you want to just open up a visualizer for your app state or a response from an API?

phronmophobic22:04:56

i guess giant is the wrong adjective. anyway, there’s lots of tools for wide data sets (eg. this db table a million rows), but I’m interested in a generic data viewer for data sets that are 5-20 levels deep and maybe up to 100s of levels wide under certain branches

phronmophobic22:04:20

REBL is in the same concept space, but I’m wondering if there are others

Daniel Slutsky22:04:49

Maybe @U47G49KHQ may comment about Reveal, as a REBL-related UI. https://github.com/vlaaad/reveal

phronmophobic22:04:09

oh yea, i’ve been meaning to look into that

vlaaad12:05:27

Oh hey, I'm building such a thing I guess, it's work in progress

vlaaad12:05:38

I sometimes have very wide data structures that are clunky to look at if you just pprint it. This is sort of the next thing I'm working on — adding support for custom visualizations, such as tables with limited height, that make it much more convenient for looking at the current level, for example

👍 4
phronmophobic16:05:58

for what I’m interested in. there are two aspects: 1. exploration - given some data, can I navigate through it and look at all its parts 2. summary - given some data, what can I know about it without looking at every single detail it has to offer I think exploration is important, but more straightforward. The problem of summarizing medium size, heterogeneous data is much more interesting to me. We have lots of ways to summarize large sequences of numbers (mean, median, mode, histograms, etc). We have fewer tools for summarizing something like a json blob that comes from an API call.

👍 4
4
teodorlu14:05:40

@U7RJTCH6J I've been using malli.provider/provide for exploring JSON structure. See #malli

teodorlu14:05:56

Caveat: you'll have to figure out where your sequences are, then you can run provide on those.

teodorlu15:05:19

@U8STBJZU0 - perhaps you're interested in this 🙂

aaelony15:05:06

@U7RJTCH6J your points (1) and (2) sound like automl to me, e.g. https://arxiv.org/pdf/1810.13306.pdf

phronmophobic17:05:43

@U0CDMAKD0, I’m not sure I follow. It looks like automl is meant to take people out of the loop rather than a tool for people to quickly form an intuition about some data

aaelony19:05:55

depends on the intuition, I suppose.

phronmophobic19:05:00

using something like automl sounds cool. I just have zero experience with machine learning so I’m not sure how I would use it

aaelony21:05:13

it's an active topic with aspirational goals riffing off of exploration and summary as you had mentioned (esp for predictive models)