This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-05-17
Channels
- # ai (1)
- # announcements (1)
- # aws (38)
- # babashka (25)
- # beginners (84)
- # biff (11)
- # calva (58)
- # clerk (14)
- # clj-kondo (14)
- # cljdoc (9)
- # cljs-dev (2)
- # clojars (2)
- # clojure (93)
- # clojure-czech (2)
- # clojure-dev (13)
- # clojure-europe (19)
- # clojure-nl (1)
- # clojure-spec (13)
- # clojure-uk (2)
- # clojurescript (6)
- # conjure (1)
- # core-async (9)
- # cursive (12)
- # data-science (7)
- # datahike (47)
- # datalevin (10)
- # datalog (3)
- # datomic (35)
- # emacs (3)
- # events (4)
- # fulcro (49)
- # gratitude (7)
- # humbleui (1)
- # hyperfiddle (42)
- # jobs-discuss (19)
- # kaocha (5)
- # lsp (20)
- # malli (3)
- # meander (2)
- # membrane (2)
- # off-topic (22)
- # pathom (2)
- # polylith (14)
- # practicalli (1)
- # rdf (3)
- # reitit (2)
- # shadow-cljs (11)
- # squint (3)
- # tools-deps (32)
- # vim (9)
- # xtdb (16)
Here comes a (potentially dumb and premature) thought that struck me today: Now that Datomic is free to use, perhaps it could be “bundled up and served” to data scientists/researchers in a user-friendly way. Backed by files on disk when working with data locally, or blob storage in the cloud, via libraries like Tablecloth. Sort of like “a high-level tidy table storage/data version control system” with minimal setup. One problem we have when working with machine learning models, for example, is keeping track of changing datasets for training, testing, cross-validation, etc. alongside our code. Datomic offers "infinite time travel" and other goodies for free. Users could of course use the Datomic libraries directly, but perhaps a higher-level interface via tidy data frames would be smoother. :thinking_face:
I've experimented with similar ideas in the past and I think there's definitely huge potential for what you suggest, particularly for data science
I am highly interested in this; I find myself reinventing a reproducibility wheel with each data science project I undertake and something that's "out of the box" would save me a lot of time and mental overhead. Datomic basically provides all the same capabilities as the bespoke "ML lifecycle" solutions but with all the flexibility of Datalog (especially as compared with a fairly rudimentary wrapper over a DB like MLflow)
The real "killer app", IMO, would be Python integration - Datomic as a kind of "Pandas backend" would do wonders for adoption and interest
There are a couple of abandoned prior art Python/Datomic projects but it looks like they wrap the REST API - I'd imagine that for data projects the Trino driver offers more leverage and would be a better basis for a Python/Datomic bridge