Is there a datomic setup where I could dump weekly https://github.com/phronmophobic/dewey data into a public read-only datomic db that anyone could query? Ideally, it would be cheap to host. I was thinking something like https://blog.phronemophobic.com/dewey-sql.html, but using datalog+datomic with data hosted somewhere cheap like S3.
@smith.adriane did you ever find a stable solution to this? Looking at this issue w.r.t. creating a Datomic storage backend in Parquet to enable partial reads of a Datomic store.
@rjsheperd Not yet! Maybe some day.
Gotcha.
This is a nice reminder that we never got connected. I'd love to chat any time next week happy to meet 1-2:30 PST (4-5:30 est). Still want to chat about the upcoming feature to make sure it meets some needs.
@jaret is this something that could serve as a drop-in replacement for the SQLite/Postgres backend?
@smith.adriane Would love to have a call with you. I think we may be working on a feature that would fit your needs and it might be releasing in the near term. (sorry if that's lame to say, but can't talk about it directly until we're design and code complete). Let me know if you have some time because running this feature by you would help us in development.
@jaret, that would be great. My schedule is pretty flexible next week if you want to suggest a PST-friendly time.
That would also support a cheap-to-host web interface like https://cloogle.phronemophobic.com/name-search.html?q=read-cond&tables=keywords#
Does that host a datomic db?
My question is more if I can create a datomic setup where all the data is public and anyone can start a repl and start running queries against the data.
I don't mind doing the devops work.
yes, it can host a datomic db (behind the app), so not exactly what you're asking
but it's the closest thing I can think of
Maybe put the data in datomic, and then just make a backup on S3. Then anyone could load that and do whatever they like.
Another reason for making it so users run queries on their on hardware is for security. Presumably, it's not safe to run untrusted queries.
Public S3 bucket hosting ~$0.03/GB/month + some transfer/api costs that this project can probably ignore
I don't think that would support a web interface that would allow anyone to run queries.
having them do it on their own datomic is safest. they can be a peer (to their own restored backup)
Yea, if there were a way to set things up so folks could just point their datomic client at my s3 bucket, I would be happy to try hosting that. Maybe I'll try the backup thing at some point, but offering a public web interface would be much more accessible. It makes it much easier to share queries with other people via a URL.
> if there were a way to set things up so folks could just point their datomic client at my s3 bucket That's how datomic s3 backups work (anyone with IAM permission to read can restore the backup)
Maybe I'm understanding incorrectly, but I thought restoring a backup would require copying the whole db and reindexing. In principle, it seems like you could run a query against an index by downloading substantially less.
restoring a backup is just copying the db, indexes and all; it takes time, but it's only data transfer time
https://docs.datomic.com/indexes/index-model.html#efficient-accumulation
whereever you see "segments", that's what's in storage, and that's what's in the backup. Restoring a backup just copies segments
the end result is a fully usable db, no additional computation
> In principle, it seems like you could run a query against an index by downloading substantially less. Yes, this is correct, but not because of reindexing; it's because you only needed a subset of segments to answer your query
I already have an option to download a sqlite db. It's about 1gb compressed. My guess is that a datomic backup would be a few times larger? Downloading a few gbs isn't so bad. I may try that approach in the future. I'm still trying to find a way to let folks run arbitrary queries against the data from a web interface and/or from the repl with minimal setup on their end.
the sqlite db is a datomic db?
i.e. it is a contents of datomic storage, ready to go?
the sqlite db is just a sqlite db with the data.
I don't currently offer a format that is ready to be directly ingested by datomic.
I would expect most equivalent sqlite dbs to be smaller, but it's not a given; depends on schema, history, indexes, etc
It's not actually a sqlite db, it's the output of sqlite's .dump command. Something like:
~ adrian$ zless '/Users/adrian/Downloads/dewey.sqlite3.sql.gz' | head
PRAGMA foreign_keys=OFF;
BEGIN TRANSACTION;
CREATE TABLE IF NOT EXISTS "basis"
("id" INT NOT NULL, "repo" VARCHAR(256) NOT NULL, "sha" VARCHAR(256) NOT NULL, "basis" VARCHAR(256) NOT NULL);
INSERT INTO basis VALUES(0,'tonsky/FiraCode','9caf6ebcfd34e3ada08534bb1e6b9b71391653fd','/deps.edn');
INSERT INTO basis VALUES(1,'tonsky/datascript','100ab864f55e056df5837e77d44dfd0f8a447983','/deps.edn');
INSERT INTO basis VALUES(2,'tonsky/datascript','100ab864f55e056df5837e77d44dfd0f8a447983','/project.clj');
INSERT INTO basis VALUES(3,'tonsky/uberdeps','82af51ef14e6d6f56b440950fd068231889e825a','/deps.edn');
INSERT INTO basis VALUES(4,'tonsky/uberdeps','82af51ef14e6d6f56b440950fd068231889e825a','/project.clj');
INSERT INTO basis VALUES(5,'tonsky/uberdeps','82af51ef14e6d6f56b440950fd068231889e825a','/test_projects/just/deps.edn');
I assume that sort of thing compresses well, but my estimates are just guesses.Once loaded (which does take a while), the sqlite db file is ~6.5 gb.
Once the "read only" datomic lands, you could have a S3 storage proxy laughcry https://clojurians.slack.com/archives/C03RZMDSH/p1771204710082289
Regarding the hosting. If you want more flexibility (like a VPS) https://www.hetzner.com/cloud/ is cheap, $5 a month, should cover the traffic and reasonable use?
@jaret Iām available at that time on thursday, April 9th
Dm me you email and I'll send you an invite
datomic now supports read-only connections
Very exciting
Oh that's fantastic! Was eagerly awaiting this š