Fork me on GitHub
#datomic
<
2018-04-27
>
stijn08:04:39

if I get the following exception when using the client lib, is this due to the networking not being OK or the S3 permissions?

Caused by: clojure.lang.ExceptionInfo: Unable to connect to system: {:cognitect.anomalies/category :cognitect.anomalies/unavailable, :cognitect.anomalies/message "Connection refused"} {:config {:server-type :cloud, :region "eu-central-1", :system "datomic", :query-group "datomic", :endpoint "", :proxy-port 8182, :endpoint-map {:headers {"host" ""}, :scheme "http", :server-name "", :server-port 8182}}}

stijn08:04:56

I'm able to curl , so I guess the networking is OK, but the error doesn't mention anything about specific database access...

stijn08:04:08

(that's on elastic beanstalk)

stijn09:04:08

ok, the proxy-port was still in the config

eraserhd19:04:12

So today, I refactored a dozen small queries, executed sequentially, into one query that's an or-join. I was surprised to find that it runs an order of magnitude slower. Is there any information on why this could be, or any information on how planning works?

eraserhd19:04:43

In this case, the normal case is for all queries to return no results.

favila19:04:55

datomic does almost no query optimization

favila19:04:12

the most important consideration is clause order: most selective clauses first

favila19:04:19

datomic query does not reorder clauses

favila19:04:14

is it possible for you to show us the queries?

eraserhd19:04:01

I can. It will take a bit, though.

favila19:04:25

your or-join has nested ands, whose content is one of the smaller queries?

favila19:04:33

or did you refactor more than that?

favila19:04:53

I'm curious if using a named rule instead would give better performance

eraserhd19:04:26

or-join with nested ands, and the ands should have the same thing as before.

eraserhd19:04:49

The and clauses are sorted so that they should be consistently be in the same order. This reduces the query from 1.2s to 0.8s, presumably because of compile caching? Although, I tried inlining a the new-result rule and it went back to 1.2s. I don't know why. The individual queries take a total of ~57ms.

favila19:04:56

all of these look like they are unavoidably full scans of an index?

favila19:04:20

perhaps you are io bound; running them in parallel increases cache stress

eraserhd19:04:10

The database is small, and I'm testing in on a local dev database anyway.

eraserhd19:04:21

Small enough to be in memory, I mean.

eraserhd19:04:47

I reloaded the repl, and it got faster. Now it's just 5x slower.

favila19:04:19

what is the peer's object cache size, and can it really fit the whole db in memory?

eraserhd19:04:33

hmm.... where do I find that?

favila19:04:15

default is 50% of vm ram

favila19:04:23

(java heap)

eraserhd19:04:33

That should be 2g, which should be waaay more than enough. The dev database is seeded from a file with at most 300 tuples in it.

favila19:04:00

yeah, no idea

eraserhd19:04:25

Let me set it explicitly, though. Just to see.

favila19:04:43

the only thing I can think to try is make one named rule and put each "and" as a separate implementation

eraserhd19:04:13

That's not hard.... but why would this help?

favila19:04:21

my understanding is that should be equivalent, but maybe or-join isn't quite the same

favila19:04:59

bottom line running these in parallel is somehow making it slower

favila19:04:29

the only thing I can think of that could possibly do that is if memory pressure causes evictions for the various parts running in parallel

favila19:04:29

alternatively, there's something wrong with the query compiler; but since these are all big scans anyway I don't know what it could do that would be worse!

favila19:04:29

this is a minor point, but != should be faster than not= in cases where you don't need clojure type coersion

favila19:04:12

I think that's every case here; you use them to avoid the same item in a self-join

markbastian19:04:56

When you create a database via datomic.api/create-database (e.g. datomic:<sql://test-db?jdbc:...>), where does the created database (e.g. test-db) get created? Does it create an actual new table or db in the backing sql instance or is it all in the datomic.datomic_kvs table?

favila19:04:30

it's in the datomic_kvs table

favila19:04:45

a datomic database uses the underlying storage like a key-value blob store

favila19:04:07

all datomic indexes are "inside" it--they're not visible to the storage tech

favila19:04:36

so "create-database" with an sql storage is an "INSERT" statement, not "CREATE TABLE" or "CREATE SCHEMA"

favila19:04:50

(it's probably an UPDATE actually)

favila19:04:41

FYI transactor only needs SELECT INSERT UPDATE DELETE permissions for its own datomic_kvs table, and peers only need SELECT

favila19:04:57

if you want to add an extra layer of protection, you can use two different sql users

markbastian19:04:25

didn't know that

markbastian19:04:41

Are there any good rules of thumb regarding how many actual datomic_kvs tables/transactors to have? I realize you only use one transactor per datomic_kvs table, but at what point would you want multiple transactors? For example, if an organization had a few projects would it make sense to have a single transactor and each project have it's own virtual databases or would each want their own transactor?

favila19:04:13

I think it's mostly dictated by write volume

favila19:04:46

perhaps other operational concerns

favila19:04:39

in general fewer transactors is easier; you split only if your write volume exceeds what a transactor can handle (remember, single writer for all dbs) or the io exceeds what the underlying storage can handle; or if you want fewer points of failure; or if you really just want to be super-sure different orgs have their data completely siloed

favila19:04:05

"fewer points of failure" isn't right

favila19:04:23

what I mean is a failure affects fewer customers

markbastian19:04:01

Thanks. That make sense.

eraserhd20:04:12

I think I just figured out that queries are cached by object id, not hash?

favila20:04:45

You mean the compiled query form is cached?

favila20:04:00

the results aren't cached AFAIK

favila20:04:18

i would expect it to be hash not object identity

eraserhd20:04:22

The compiled query form.

eraserhd20:04:29

Yeah, just confirmed. If I extract the query and run it (without rebuilding it), the first time is about 300ms, and every subsequent time is about 20ms. If I wrap the captured query with (read-string (pr-str ...)), it goes back to 300ms every time.

eraserhd20:04:08

Although, if I try minor perturbations of the query, it still seems to work. So there's something in the query form with a bad hash or equal?

eraserhd20:04:40

uh, yeah.... whoa

dev=> (hash foo/test-q)
-1500982317
dev=> (hash (read-string (pr-str foo/test-q)))
1049440012
dev=> (hash (read-string (pr-str (read-string (pr-str foo/test-q)))))
-604086260

eraserhd20:04:56

And the culprit is regular expressions!

eraserhd20:04:33

dev=> (hash #"foo")
738015301
dev=> (hash #"foo")
793194086

eraserhd20:04:10

OK, so that's what happened. I had a regular expression literal in one of the small queries. As a result, 11 of 12 of the small queries were compile-cached, but the last wasn't. When I converted to a big query, it couldn't be cached because of the literal.

eraserhd20:04:52

Now the large query runs faster - 20ms vs 50ms for all small queries.

favila22:04:43

ah, nice debugging