Fork me on GitHub
#data-science
<
2023-05-17
>
leifericf11:05:27

Here comes a (potentially dumb and premature) thought that struck me today: Now that Datomic is free to use, perhaps it could be “bundled up and served” to data scientists/researchers in a user-friendly way. Backed by files on disk when working with data locally, or blob storage in the cloud, via libraries like Tablecloth. Sort of like “a high-level tidy table storage/data version control system” with minimal setup. One problem we have when working with machine learning models, for example, is keeping track of changing datasets for training, testing, cross-validation, etc. alongside our code. Datomic offers "infinite time travel" and other goodies for free. Users could of course use the Datomic libraries directly, but perhaps a higher-level interface via tidy data frames would be smoother. :thinking_face:

💚 6
octahedrion14:05:23

I've experimented with similar ideas in the past and I think there's definitely huge potential for what you suggest, particularly for data science

👍 2
respatialized15:05:47

I am highly interested in this; I find myself reinventing a reproducibility wheel with each data science project I undertake and something that's "out of the box" would save me a lot of time and mental overhead. Datomic basically provides all the same capabilities as the bespoke "ML lifecycle" solutions but with all the flexibility of Datalog (especially as compared with a fairly rudimentary wrapper over a DB like MLflow)

respatialized15:05:52

The real "killer app", IMO, would be Python integration - Datomic as a kind of "Pandas backend" would do wonders for adoption and interest

💯 2
respatialized15:05:01

There are a couple of abandoned prior art Python/Datomic projects but it looks like they wrap the REST API - I'd imagine that for data projects the Trino driver offers more leverage and would be a better basis for a Python/Datomic bridge