Fork me on GitHub
#datomic
<
2022-04-22
>
plexus07:04:22

It seems datomic peer still ships with presto 348 judging by the changelog. Are there concrete plans to upgrade and/or migrate to trino?

plexus07:04:38

Context: we have built a BI solution for a customer based on datomic analytics and metabase. Metabase used to implement a custom presto connector (using http directly). In the latest release they have replaced that with a jdbc based connector, but in the process also migrated to trino, so we are currently held back from upgrading metabase.

emccue12:04:29

What is trino and what is presto 348?

favila13:04:00

This is about the datomic analytics product. Prestosql is a sql query engine; datomic has a connector for it so you can sql-query a datomic db. prestoSQL renamed itself to trino because of a competing fork of the engine called prestodb: https://trino.io/blog/2020/12/27/announcing-trino.html 348 is a version number of prestoSQL (released dec 14 2020) from before it’s rename to trino

👍 1
Kris C13:04:57

I am trying to find reverse keys for an entity ("foreign" ref attributes that point to this entity). I have found the following "trick" via google:

=> (.touch ent)
=> (keys (.cache ent))
but it doesn't seem to work. Is there any other way to achieve that?

favila13:04:07

What do you mean “id doesn’t seem to work”?

Kris C13:04:49

I do not get the reverse keys. Only the attributes of the entity

Kris C13:04:58

Ah, it was a typo "it doesn't seem to work"...

favila13:04:38

Get the entity db out using d/entity-db then query [_ ?attr ?e] or (d/datoms db :vaet e)

Kris C13:04:46

ack, any idea why the "trick" is not working?

favila13:04:43

I think touch used to realize Eavt and Vaet, but now only does Eavt. This is a hack anyway: entity is designed for when you know your attributes in code already. D/touch is for dev-time printing of entity maps and such

Kris C13:04:18

ah ok, thank you so much, @U09R86PA4!

devn15:04:40

@jarrodctaylor saw https://max-datom.com/ on the front page of HN. Nice job!

Jarrod Taylor (Clojure team)15:04:36

Thus begins our march to Datomic world domination!!

partyparrot 6
🚀 7
catjam 4
Elliot Block17:04:06

Apologies if this is a little nutty/premature, but if we have #clojuredart apps running natively on the desktop, is it a crazy idea to want to try to put a Datomic client into them so that apps can talk directly to DB / have direct access to the information model? I assume this is a ton of work, but curious if it makes sense as an architectural model in the first place. Curious if this could cut a lot of the intermediate infrastructure out of a frontend app. Any thoughts greatly appreciated!

favila17:04:09

This is theoretically possible already with clojurescript, but no one has wanted it enough to write a clojurescript client api library, so I’m not sure dart changes that.

favila18:04:55

I would say this makes sense as a debugging tool or replacement for the (not very good) datomic console, but architecturally you very quickly need more layers to enforce policy, and then the direct connection stops making sense because you need to reduce its power in some way, or allow for interception and transformation.

favila18:04:57

So I don’t see “datomic client in a dart desktop app” as a game changer; you will need an intermediate with more control before very long, and then you’re back to an intermediate framework or library (which there are already many good ones, e.g. fulcro or reframe)

Elliot Block18:04:09

yeah fascinating thank you! I was kind of hoping perhaps something like Datomic database functions had evolved to be, for example a higher-level intermediary/domain model for transactions against the DB but it sounds like that’s not really it/there yet. very interesting! will start looking into the intermediates, thanks!

favila18:04:46

the datomic cloud answer to that need is ions

👍 2
favila18:04:39

By analogy with the SQL world, many sql dbs do have features which look like they might be enough to make them application platforms, ie authentication, stored procedures, and procedure+table/row/column-level access control

favila18:04:21

but I rarely see someone say, “lets just put a sql client in our desktop app”

favila18:04:21

so even if datomic did grow similar features, I’m not sure it would be a good or popular choice to use the datomic client api as the client application’s primary api to interact with data

Elliot Block18:04:05

right — there’s usually some kind of application-level API in between the client and the DB it’s interesting to me because sometimes the application-API ends up being REST/RPC/GraphQL that looks basically almost just like the DB, but not the DB. Like it ends up being some kind of higher domain model, with auth, but not exactly the low-level data model, but related to it…

favila18:04:33

yeah. but those “not exactly like” are what kill

Elliot Block18:04:10

haha indeed =/

favila18:04:59

even as a backend interface, using datomic directly is becoming a problem for us in some cases. sometimes we need to preserve an attribute with its meaning but not its implementation

Elliot Block18:04:00

okay so, and so therefore if one needs that anyway, might as well put that on a server and then have a client talk to that server

favila18:04:05

datomic has a really great attribute-centric data model, but, it is still at the end of the day an implementation specification not a data model

favila18:04:10

I’m looking at pathom3 very seriously as something that has the attribute model of datomic but more flexibility around evaluation and implementation. And yes, the vast majority of attributes just pass through to datomic

Elliot Block19:04:06

yeah very cool — structurally pathom looks kinda like a federated GraphQL gateway (e.g. https://apollographql.com) except with a logic-programming/datalog/prolog-y query engine instead of a GraphQL-nested-map-join engine Both those plus the new upcoming existence of the HTTP QUERY method make me think the API intermediary clients want is this “application-level set of API functions, but need to be able to express queries more sophisticated than ‘GET resource’”

JohnJ15:04:43

What does it mean to preserve an attribute with its meaning but not its implementation? to preserve the attribute meaning outside datomic?

favila15:04:03

To avoid having to rewrite all the code that uses it

favila15:04:26

Concrete examples: we needed for operational reasons to drop full text from some attributes. Datomic doesn’t let you do that: you need to make a new attr. At our data size this involves a migration (db is using two attrs at once for the same data for a time)

favila15:04:05

It would have been really nice to hide all this from the code and let it keep using the same attr. It would have saved weeks of dev time

favila15:04:40

Another example: datomic can (but really should not) store strings larger than a kB or two. The recommendation is to store a key to some other system. We end up with a hybrid encoding for latency where it’s in datomic if short enough. Now the same data is across two concrete attrs. Again, a migration was involved. Even worse, this introduces n+1 problems with the other store without ugly contortions.

favila15:04:29

Another example: we store stat aggregates (eg counts of x for y) and have them available as attrs, but not be forced to have them be concrete attrs in datomic all the time

favila15:04:53

All of these come down to: d/entity and d/pull have a fixed implementation that maps an attribute to a datomic attr, and if we want to use attrs as stable interfaces, we need some implementation flexibility that these don’t provide

JohnJ16:04:00

got it thx, I guess this comes down to how much logic you want to keep in the db vs the app

JohnJ16:04:37

like being at the mercy of the DB vs writing a bunch of application code

favila16:04:37

I don’t think that’s quite right. Being at the mercy of the db can mean (re)writing a bunch more application than you started with

Elliot Block16:04:34

At the risk of taking the thread in a circle, does that mean it’s possibly a reasonable thing to want to put an abstract datalog interface in the client, whose persistent storage backend is an implementation detail? But there is an data-layer-interface with auth directly in the client? (where the abstract interface looks datomic/datalog-like, but may or may not be directly implemented against datomic?)

favila16:04:39

Sure, that’s a possibility

JohnJ16:04:56

but it's true for most databases systems out there no?

Elliot Block16:04:14

(okay awesome that line of thought is coming together, thank! Totally makes sense that the DB implementation itself is often useful to put behind an abstraction e.g. for auth, policy, facading instead of migrating, abstracting over sharding, etc.)

Elliot Block16:04:06

(Reminds me of this old pattern from long ago: https://en.wikipedia.org/wiki/Data_access_object)

favila16:04:09

@U01KZDMJ411 yeah that’s my point. Datomic is not magic. It’s implementation is fixed within certain boundaries, like any db.

favila16:04:59

attributes, pull exprs and datalog are great for data model expression, but d/pull, d/query d/entity are not data models but implementations of them that map to a datomic storage engine in a fixed way

👍 1
favila16:04:19

It’s a so much lower friction abstraction with such great sympathy which how Clojure models data that it can be easier to make the mistake that the datomic attrs and the data model are exactly the same

👍 1
favila16:04:37

No one would make that mistake with sql!

👍 1
favila16:04:37

Also you can go a very long way before you hit a painful bit where you realize you need a little indirection

favila16:04:31

But you’ve already written a bunch of “boundaryless” code by that time and backfilling the abstraction layer you need becomes hard

JohnJ16:04:13

yeah, the attractiveness of datomic with clojure is how you can keep using the same data model in both but you make a clear point how can the implementation limit the benefits of EAV flexibility

Elliot Block16:04:52

This is probably horrifying but theoretically if the Datomic client interface supported either pre/post hooks / CLOS metaobject-style extension / interceptor middleware / multimethod-or-protocol dispatch, then you could keep the calling code the same but add general-or-casewise behavior modifications Otherwise it seems like all the code needs to be written to an abstract interface/indirection just in case future behavior extension is needed, and otherwise it’s just an empty pass-through layer.

favila16:04:27

Again I have high hopes for pathom in this respect

JohnJ16:04:38

don't know much about pathom, but it would be something like having SQL Views?

JohnJ16:04:56

on top of EAV of course

favila16:04:34

Sort of. It only an attribute/pull expr model (no datalog). You define “resolvers” which declare what they need as input attrs on an entity and what attrs they provide for that entity. You the. Query it by seeding with what data you have and a pull expr and you get a map of the same shape filled out with what you asked for

JohnJ16:04:13

FWIW, "boundaryless" is what keeps me using datomic for personal stuff, like "look at all the stuff I don't have to write" but can see how that can become a problem at scale

👍 1
favila16:04:46

It also has a “foreign interface” where you get an entire query subtree extracted for you (eg, all the datomic attrs that map 1-1) and you just need to return a map in the right shape. This makes it really easy to have the “fall through” cases, and is also a handy way to avoid n+1 problems across process boundaries

👌 1
JoeA20:04:39

Interesting, so the benefits of not having impedance mismatch are somewhat erased by the implementation?

favila20:04:57

I’m not sure I follow?

JoeA20:04:13

Maybe I'm misunderstanding, but I'm assuming there's no impedance mismatch between clojure and datomic in the data model which makes it sound like everything is going to be smooth sailing but the database restrictions don't make it so

JoeA20:04:48

when you say no one would make this mistake in SQL, is it because you are forced there to write some abstraction layer? to isolate the application layer from the DB.

favila20:04:17

So in many cases you can represent your data model in datomic without much translation. The result of a d/pull is exactly what your domain models would have looked like.

favila20:04:33

but that is basically never true in SQL

favila20:04:26

so if one day your data model is not exactly like datomic, you probably didn’t write a layer of indirection in between your domain objects and your d/pulls already, so now you have to retrofit it in.

favila20:04:45

but in SQL world, the natural mode of expression in SQL is so different that you almost certainly have that layer built already

JoeA21:04:29

understood, thx, do you still prefer datomic's data model despite the implementation lack of features / restrictions?

favila21:04:45

prefer it to what? sql?

JoeA21:04:17

yes, to traditional RDBMS like postgres

favila21:04:50

oh god, a million times yes. not sure how you could have gotten another impression 🙂

favila21:04:41

the only thing I sometimes want from RBDMSes are specific operational characteristics

favila21:04:47

I never ever want its data model

JoeA21:04:59

yeah, operational characteristics

JoeA21:04:39

are important though, like the string limit and rigidity of attrs in datomic look jarring

favila21:04:38

yeah something like TOAST for large values is a curious omission. But I don’t follow on the “rigidity of attrs”. In every way attrs seem more flexible than columns and tables

JoeA21:04:00

I'm just checking out datomic, haven't use it in anger

JoeA21:04:41

by rigidity of attrs I mean the implementation not the data model, like you can't disable fulltext

favila21:04:26

yeah, fulltext is also a bit of a quiet curse.

JoeA21:04:50

which was you alluded to before

favila21:04:49

yep. and if datomic could do these things those sources of indirection-need would have been gone. but there’s still stuff like maintaining aggregates, maintaining computed/derived values (materialized or not), etc, that I’m not sure I can reasonably ask datomic to take care of.

favila21:04:05

There’s also an inherent cost to keeping all transacted data--some stuff really is just ephemeral and high-volume and storing it in datomic forever becomes a chore, and it’s a shame you need to give up the attribute model to do it.

favila21:04:48

these all speak to an occasional need for some indirection without faulting datomic for being the kind of db it is and not another one.

favila21:04:45

and there are dbs that support an attribute model but are quite different from datomic: datalevin, xtdb, and many flavors of datascript storage backend

JoeA21:04:41

yeah, had a little look at them, xtdb looks more like a document store, different data model than datomic, the others one don't look to serious/ready for production use, but can't tell

JoeA21:04:28

datomic is a though choice, there's the operational overhead also (more processes)

JoeA21:04:05

I guess for anything serious, like public webapps, a moment will come when you just have to run another DB too, besides datomic

JoeA21:04:10

so maybe just having to deal with tables doesn't start to look that bad

favila21:04:44

Bah, I disagree. tables never again

favila21:04:37

FWIW at Shortcut we use one datomic database as our primary store with dynamo as the backing store, and additional dynamo tables and s3 for stuff that isn’t appropriate for datomic (high write volume, ephemeral data, large blobs)

JoeA21:04:56

sounds good, less devops headache but maybe too pricey?

favila21:04:00

and I love the peer model--scaling query load with peers is way easier than admistering a cluster

JoeA21:04:19

so no clients?

favila21:04:09

we have a peer-server around, but we don’t use it for sustained load. Again, it’s an indirection problem: d/entity and d/pull can’t transparently be replaced for peer vs client

favila21:04:18

with something like pathom in the middle, it could

favila21:04:16

our biggest headache honestly is dynamodb + memcached. We have serious envy for the cloud’s 3-tier storage and wish on-prem had it too

JoeA21:04:44

gotcha. Curious if you can share, a nubank article scared me saying they run more than 2400 transactors, shorcuts looks really cool, how many transactors are running?

JoeA21:04:26

oh impressive, so read heavy workload

favila21:04:51

yes; that’s definitely what datomic is for

JoeA21:04:12

What's the issue with dynamodb? setting the correct read/writes?

favila21:04:46

it has really high latency variance, and is expensive

JoeA21:04:15

I would have thought that something like shortcut would be write heavy

JoeA21:04:21

but no idea really

JoeA21:04:10

so dynamo's claim everything is done in single-digit milliseconds is not true?

favila21:04:07

hah, no. to be fair, datomic is using dynamo as a blob store. for typical item sizes people use dynamo for (a few kb at most), dynamo may indeed have better variance.

favila21:04:15

but there are plenty of products that are dynamo/cassandra-like and exist pretty much only to guarantee lower latency and variance, e.g. https://www.scylladb.com/

JoeA21:04:56

Ok, have you tested SQL storage on a fast disk as options for shortcut?

favila21:04:28

not for shortcut, but I’ve used mysql as the storage on moderately sized datomic dbs in the past (3+ years ago). It was fine

JoeA21:04:36

interesting, can datomic be made to work with those? scylladb for example

favila21:04:17

I don’t think so because you need to use their client

favila21:04:23

datomic uses the aws client directly

favila21:04:35

maybe it’s wire-compatible and there’s some way to make that work

favila21:04:06

either with the dynamo or the cassandra backend

JoeA21:04:20

cool (about mysql), I have setup datomic with postgres for now, since it's a single table, I'm wondering if datomic can really max out postgres, I guess it would require a lot of peers

JoeA21:04:35

anyway, if shortcut can run with one transactor which is impressive, then I think I'm going to be ok 😉

JoeA21:04:23

does any SQL storage should work?

JoeA21:04:19

the docs indicate it should, curious how they abstract that, it uses some lowest common denominator standard SQL?

favila21:04:42

well, it has sql to build the tables

favila21:04:51

again, it’s used as a key-value blob store

favila21:04:23

any sql that can store moderately sized binary blobs efficiently will do (a few kb to <2mb)

JoeA21:04:24

yeah, but the queries

favila21:04:33

what queries?

favila21:04:48

select, insert, update, delete

favila21:04:50

that’s it

JoeA21:04:14

yeah those, pretty basic, I guess the SQL dialect of those don't change between DBs for the very basics

favila21:04:59

yeah they are very simple sql statements. I’ve used it with sqlite without issue

JoeA21:04:42

oh neat, sqlite, one machine, less processes, is the java driver solid? It feels like the java world favors stuff java based like h2 more than sqlite

JoeA21:04:41

h2 included in datomic is very old

favila21:04:14

I’ve used the xerial driver, it’s fine https://github.com/xerial/sqlite-jdbc

favila21:04:36

it won’t be network-addressible like h2 though

favila21:04:07

so the peers need to be on the same instance. That’s fine for bulk workloads but not much else

JoeA21:04:36

gotcha, the transactor does run embedded correct?

favila21:04:14

no? the transactor and peer are always separate processes. You just won’t have an extra storage process

JoeA21:04:51

I mean, the transactor uses h2 in embedded mode

JoeA21:04:46

In embedded mode I/O operations can be performed by application's threads that execute a SQL command. The application may not interrupt these threads, it can lead to database corruption, because JVM closes I/O handle during thread interruption.

JoeA21:04:20

do you know if datomic handles that?

favila21:04:40

probably? h2 is only used by dev storage, which is special because the transactor itself exposes an additional port as the storage port (I believe using sql). And you won’t use dev in production anyway.

favila21:04:55

peers do not access the h2 file directly

favila21:04:58

it may use server mode honestly

favila21:04:25

because it also exposes the console on yet another port

JoeA21:04:52

if it uses server mode only and not mixed mode, a h2 process should be visible no?

JoeA21:04:37

but only see the transactor and peer process

favila21:04:40

it’s probably mixed mode then

JoeA21:04:54

Was thinking that for light load it might be ok but the h2 version is very old, could try upgrading or use sqlite

JoeA22:04:48

anyway, thx for the chat and insights

lwhorton21:07:20

this conversation was lovely. thank you both, i learned a lot.