Fork me on GitHub
#datomic
<
2022-05-11
>
lispers-anonymous12:05:25

Does the datomic cloud client api cache connections? We have a poorly implemented caching mechanism for our datomic connections and I'm wondering if it's even necessary for us to hold onto these objects.

Joe Lane12:05:25

What problem do you think caching the connections is solving?

lispers-anonymous12:05:35

We have a multi tenant architecture. In production, a single server can connect to one of 350+ datomic databases when fulfilling an http request. I assume the people who implemented the connection cache wanted to make sure that getting a connection was as fast as possible

lispers-anonymous12:05:23

I'm making guesses though. The people who implemented the connection cache (a la memoize) are no longer with the company, commit history has nothing, chat history was lost in an acquisition.

Joe Lane12:05:41

Once a connection is acquired, it can be used indefinitely without perf penalty. Connections are also thread safe.

lispers-anonymous13:05:34

So they are safe to cache, I figured that much since we've been doing it for years without much issue. I'm guessing the datomic client api doesn't cache them then, and if we want them cached we should continue managing that ourselves.

Joe Lane13:05:50

That's correct, the client does not cache them.

lispers-anonymous13:05:56

Right on, thank you Joe. I'll be changing the code to use something like core.cache instead of memoize to manage these things.

Joe Lane13:05:32

An atom may be sufficient as well. FWIW, although it may be quick to acquire a new connection, that does not mean the compute-group the connection object routes the request to has the DB Spun up and there may be a small delay while that DB is loaded. If you wish to avoid this, you can create query-groups specific to a DB or group of DBs to ensure they always remain loaded (e.g. by querying that DB occasionally). The tradeoff here is that all DBs in that compute-node compete for resources.

lispers-anonymous13:05:28

Yeah, we think we have observed this behavior (delay from the query group loading the DB). Right now all our databases use the same query group. We're also working on changes that will allow us to spread our database connections across a number of query groups instead funneling them into one. There is a good bit of work we have to do to make that happen but it is underway.

Jake Shelby00:06:40

@UDVJE9RE3 the last point under connections here indicate that connections are indeed cached and creating them is inexpensive https://docs.datomic.com/cloud/client/client-api.html#connection

lispers-anonymous18:06:58

Yeah, that documentation contradicts the behavior I've seen though, and what was said in this thread. The connection objects returned are for sure not cached. The instances are not identical across calls to db/connect.

lispers-anonymous18:06:36

It also doesn't feel inexpensive. There are network calls being made. I see a significant delay when I call d/connect. Sometimes as much as 1 full second.

neilprosser15:05:40

I'm going to ask this as a separate question but it relates to the question above about connection caching. Hopefully this all makes sense. We cache the connections in our system but I've found today that when I used d/sync on my cached connection using a dodgy value for t (in my case I mistakenly put a tx in there, which is a bigger int) datomic.client.impl.shared/advance-t* has changed the t value of the connection state but the stale connection checking which is done in datomic.client.impl.shared/recent-db will never correct that state because the dodgy value is always greater than the status received from the remote call. I get Database does not yet have t={...} and it will keep giving me that error until I flush that connection from the cache. So, my question is, is there a way that connection can be brought back to life or is this just something we need to be very careful about when caching connections?

Joe Lane16:05:10

@neilprosser Passing a future t to d/sync is UB, be sure to avoid doing that.

ghadi16:05:12

@neilprosser be sure not to pass t 's from untrusted clients, like browsers

☝️ 1
ghadi16:05:40

one simple technique I've used when interacting with untrusted clients, is to make an authenticated 'cookie' server knows some secret... cookie = HMAC(some_secret, t)

Joe Lane16:05:23

One might even it a zookie 🙂

neilprosser16:05:19

Thanks @joe.lane and @ghadi. I figured it might just have to be a 'be more careful next time and don't do that'. I hadn't realised that it was a big problem until I broke it today.

favila16:05:58

Is there an officially-supported way to get a T from a TX in the client api?

1
favila16:05:35

I heard a scary thing that cloud’s entity id structure isn’t guaranteed, so masking out the partition bits like on-prem d/tx->t doesn’t work. So what’s the alternative?

lispers-anonymous14:05:49

(bit-and eid 0x3ffffffffff)

lispers-anonymous14:05:23

That is what we use in cloud. Cognitect told us about it

favila14:05:53

so cloud’s entity id structure is still the same?

favila14:05:59

or at least the-same enough?

lispers-anonymous14:05:28

It has not failed us so far, but we do not use this in production code. Only when debugging or auditing our databases

nottmey17:05:26

@ghadi Oh, why is that? I wanted to build a consistency mechanism in my application by returning t to my client and then using it for the following queries which need to be in a consistent state with the data the client has send before. (I assumed t is exaclty meant for that purpose 😅)

favila17:05:01

The “why” is that this is a DoS vector.

favila17:05:35

they could supply a T far in the future, or discover and exploit some bug in datomic with invalid values

favila17:05:08

We do this too (on-prem api though), but 1) we parse the T. 2) we limit its range 3) we check that it isn’t too far “head” of the current T 4) we put a timeout on the deref to wait for the sync to complete.

1
nottmey18:05:10

I see, thank you for pointing that out. I didn’t even think about higher values :man-facepalming: Ok, I imagine that’s manageable. -> Wanted to go for a low timeout retry mechanism (with exponential backoff) anyway.

zakkor18:05:29

Hi guys! I am trying to create a datomic schema for a data structure that looks something like this: (some fields omitted, but not important)

{:url url
     :negotiable? ""
     :agency? false
     :raw {:title "asdasd"
           :description "asdasd"}}
I am having trouble figuring out how I am supposed to represent the nested map. Assuming my top-level entity is called a "posting", should I have something like...
:posting/url
:posting/negotiable
...
:posting/raw.title
:posting/raw.description
(I don't even know if the dot syntax is a thing, I'm just guessing) Or perhaps something like making :posting/raw a ref type with isComponent? but then I'm not sure how I'm supposed to define its fields

jcf18:05:55

If you know the keys and value types you'll need in the nested map, a component entity would make sense. You'd only have to install each attribute and could then transact the nested maps.

zakkor18:05:56

@U06FTAZV3 I think I've got it, something like this?

[(create-attr :posting/url :string)
                    (create-attr :posting/raw :ref {:db/isComponent true})
                    (create-attr :raw/description :string)
                    (create-attr :raw/phone :string)])
(excuse my constructor function)

zakkor18:05:26

(d/transact conn [{:posting/url "http" :posting/raw {:raw/description "desc" :raw/phone "0723"}}])

jcf18:05:15

That looks about right to my naked eye. 🙂

jcf18:05:10

I've used a convention for nested things that worked quite well in the past. The raw bits (assuming they only show up inside postings) would be called :posting.raw/description etc.

jcf18:05:26

That can make destructuring things quite tidy, and can line up with your namespaces.

jcf18:05:04

Not a requirement at all. Your names are your business. 🙊

zakkor18:05:23

Yeah, I was going to say that as a newbie, it feels a bit weird to make :raw/something a "global" datom, when it only makes sense in the context of a :posting

💯 1
zakkor18:05:17

calling it :posting.raw is just a naming change, or will it actually let me operate on the "raw" as a map when transacting/querying?

jcf18:05:41

Sounds like you're off to a flying start with the way you're conceptualising this stuff!

jcf18:05:24

If you prefix things with posting.raw/… Clojure will help with destructuring but Datomic itself will just see these as names. I can't think of a place where any sort of formal hierarchy shows up on the database side of things. They really are just names.

jcf18:05:50

In Clojure it's nice when you can do things like this:

(let [{:posting.raw/keys [description]} posting]
  (str "Description is " description))

jcf18:05:32

That's not a great example because you'd probably just access the map directly but with realistic code… 🙂

zakkor18:05:12

I see what you mean 😄

zakkor18:05:56

@U06FTAZV3 thanks a lot for your help!

👍 1
1