Fork me on GitHub
#datomic
<
2016-03-21
>
casperc12:03:38

I might be coming up against a bug in Datomic here. I am doing a pull call against an as-of db value, and Datomic seems to be doing a full db scan.

casperc13:03:00

(time (d/pull (d/as-of db 13194152884511) '[*] 17592199395616))
"Elapsed time: 127993.736043 msecs"

casperc13:03:49

(time (d/touch (d/entity (d/as-of db 13194152884511) 17592199395616)))
"Elapsed time: 0.844873 msecs”

casperc13:03:23

Does the same thing basically, but the pull version takes forever. When doing the pull against the current db there is no problem.

casperc13:03:24

@bkamphaus: Mind taking a look at this one you get on?

Ben Kamphaus13:03:20

@casperc one big difference between pull and entity is that entity is lazy. So the entity call isn’t really measuring any retrieval. Are there a lot of component refs? If there are, depending on the size of the graph implied by component refs from the starting entity, a wildcard pull could result in a lot of work.

casperc13:03:08

Yeah, but I am doing a touch, doesn’t that fetch all the datums that are lazy?

Ben Kamphaus13:03:48

gah, reading. Yes.

casperc13:03:02

And it doesn’t actually have any component refs in the mix simple_smile

Ben Kamphaus13:03:12

Have you restarted and tested in isolation?

Ben Kamphaus13:03:28

I.e. is entity taking advantage of pull’s having cached segments?

Ben Kamphaus13:03:39

Or are both hot cache examples.

casperc13:03:12

The database does have alot of data in it though, 5M of that type and about 20M entities in total. Dunno if that makes a difference.

casperc13:03:57

Both are hot in the sense that pull takes a long time regardless of doing it right after entity or a previous (slow) pull.

casperc13:03:07

And it takes up alot of CPU on the peer with the pull, so I think it must be missing an index somehow.

casperc13:03:03

Just retested from a cold start again, and it is the same. And the peer is fetching alot of data on top of the CPU usage, even though it should be cached (via the previous entity/touch call)

Ben Kamphaus13:03:23

which version of Datomic are you using?

casperc13:03:59

[com.datomic/datomic-pro "0.9.5344"]

casperc13:03:15

My process is at 1,08 GB received from just having done those two calls simple_smile

Ben Kamphaus14:03:48

I can’t repro a disparity in performance like that with any of the large dbs I have availably locally. I suspect there’s something specific to your data or schema that’s hitting a corner case. Quick sanity check - are you essentially getting the same results out of both? E.g. checking count and contents of each returned map? You can (into {:db/id ent-id} (d/entity asofdb ent-id)) to put the data into a map similar to what pull returns (except differing re: some retrieval behavior and ident resolution).

casperc14:03:09

Just checking. Count is the same

(= (keys ent-res-map) (keys pull-res))
true

casperc14:03:12

But equallity is false for some reason, let me just check why

(= ent-res-map pull-res)
false

Ben Kamphaus14:03:37

{:db/id ...} vs ident probably.

casperc14:03:06

It is due to a ref in the entity map being a datomic.query.EntityMap not clojure.lang.PersistentArrayMap like from the pull

casperc14:03:28

Otherwise they are equal

casperc14:03:12

(count ent-res-map)
47

casperc14:03:09

I should mention that there is only the one tx on the entity i am pulling, so pulling from the current db actually gives the same result as pulling from the as-of db. Dunno if that could be the edge case being hit, given that for most uses you would just use the current db value.

Ben Kamphaus14:03:54

I wouldn’t expect that to matter and that was true for the first local repro I tried.

casperc14:03:10

So any way to debug this? I’d be happy to take it in a private convo or file a bug to avoid spamming the channel.

Ben Kamphaus14:03:53

Following up in private message.

sonnyto16:03:19

I built a spreadsheet like clojurescript app using OT https://en.wikipedia.org/wiki/Operational_transformation and using datomic for persistence. every key stroke is sent to datomic. It works well but I want to get feed back if this is a good use case for datomic. I'm afraid the transactor cannot handle the load. I like using datomic for this use case for its time model. I'd like a user to go back in time and see all changes to the data

sonnyto16:03:27

perhaps git is a better use case for this? but i'd like the user to be able to query the data as well

sonnyto17:03:28

this looks interesting and probably would work better for OT use case than datomic https://github.com/Jannis/gitalin

kingoftheknoll17:03:42

Just throwing it out here. Yesterday I was setting up my first project using Datomic Pro and I noticed that when I do lein run the process will hang, but if I start the repl with lein repl or cider-jack-in, I get the gpg password request and then I can start the server from the repl. It seems like the lein run somehow blocks the gpg auth popup or just doesn’t know how to do that.

sonnyto17:03:04

@kingoftheknoll: strange. i've never had that experience. I'm using boot but that shouldnt make a difference. what is your DB URL?

kingoftheknoll17:03:34

Not actually requiring anything in yet.

kingoftheknoll17:03:38

Just loading the deps

kingoftheknoll17:03:40

I mean I can get around it and I’ve testing that I can use datomic in the repl with an in memory db but just can’t start my ring server with lein run

sonnyto17:03:25

kingoftheknoll no error messages?

kingoftheknoll17:03:55

nope just hangs

Ben Kamphaus17:03:47

@kingoftheknoll: I always get gpg prompting rather than indefinite hanging when I use it (though it hangs a little sometimes). That said, a workaround that bypasses the entire process is to download the distribution and run bin/maven-install - it will put the dep in your local maven and then you don’t need repo+creds in lein.

kingoftheknoll17:03:00

does that mean I don’t need to include it as a dep in project.clj?

kingoftheknoll17:03:31

wait, I think I’ve already done that

Ben Kamphaus17:03:17

You’ll still have: [com.datomic/datomic-pro "0.9.5350”] in the :dependencies list, it will just find it in your local maven.

kingoftheknoll17:03:02

what about making a jar file, would it bundle the dep from maven for me?

kingoftheknoll17:03:17

^ sheer ignorance here sorry

dm318:03:24

@sonnyto - did you do a write up about your app anywhere? sounds interesting

sonnyto18:03:30

@dm3 no but I would like to opensource it later

Ben Kamphaus19:03:30

@kingoftheknoll: ls ~/.m2/repository/com/datomic/datomic-pro/0.9.5350

jannis19:03:44

@sonnyto: Please be aware that gitalin is highly unfinished. 😉

sonnyto20:03:36

@jannis: have you played with it?

jannis20:03:24

@sonnyto: I only tried it last week.

jannis20:03:30

@sonnyto: Just kidding, I'm the author. 😉

sonnyto20:03:16

it looks cool and woudld fit my use case nicely

jannis20:03:23

If you can call me that, since it's so unfinished...