Fork me on GitHub

Alpha5 is now done. Same as Alpha4, but queries are significantly faster on large datasets

Craig Brozefsky13:03:47

putting Alpha5 thru some paces

😬 4
👍 4

Well, it’s alpha so we can find the big problems and address them before it’s called a “release” 🙂

Craig Brozefsky13:03:04

gosh I'm rusty at clojure 8^)

Craig Brozefsky13:03:48

also, do not buy Brach's 24 Flavours tiny Jelly Beans

Craig Brozefsky13:03:25

their attempt at Jelly Belly knockoffs... It's like they didn't realize that the flavors must harmonize when you shovel a handful into your maw

😆 4
Craig Brozefsky13:03:37

ok, think I broked it


Yup? What’s happened?

Craig Brozefsky13:03:58

Locked up importing a few thousand objects

Craig Brozefsky13:03:12

I'll break up the txn

Craig Brozefsky13:03:31

and we'll see what's happening. I must first eliminate my own stupidity...


Actually, breaking up the transaction is a bad thing to do. How big is the file that you’re importing?


(bad, because you end up expanding the indexes significantly)


Also, if you’ve made a mistake, I’d like to know about that too. I should document gotchas, and mitigate some of the more obvious ones

Craig Brozefsky14:03:43

yah, so it's not locked up, but just slow. Mind you I'm throwing a lot of large complex objects at it

Craig Brozefsky14:03:54

I'll get data to you shortly


“large complex” is going to be an issue. Zuko (the module that breaks it up into triples) is now faster than it used to be, but there’s still a lot of work for it to do

Craig Brozefsky14:03:08

Yah, I'm thinking it's a chance to intrument the whole thing with metrics data

Craig Brozefsky14:03:31

Importing o1 into asami Importing to asami "Elapsed time: 53402.866849 msecs" Imported 4503 Importing o2 into asami Importing to asami "Elapsed time: 644524.77233 msecs" Imported 41484

Craig Brozefsky14:03:22

FAIL in (load-test) (core_test.clj:21) Test loading and parsing expected: (= (count o1) (count (:tempids tx1))) actual: (not (= 271 4503)) lein test :only netgrok.core-test/load-test FAIL in (load-test) (core_test.clj:22) Test loading and parsing expected: (= (count o2) (count (:tempids tx2))) actual: (not (= 2570 41484))

Craig Brozefsky14:03:51

So the failures are me expecting the entity count to be the input object count. The difference tells you just how complex some of the objects are, with many nested entities...


It creates lots of temporary IDs, but unless you ask, I would generally think they’d match the provided objects. :thinking_face:


I’m assuming that some or all of this data can be shared?

Craig Brozefsky14:03:11

not sure. it's packet dumps form my home network

Craig Brozefsky14:03:10

I will find some representative data

Craig Brozefsky14:03:38

The :tempids in the tx would include the nested objects right?


I didn’t think so (unless you asked it to). So unless I’ve forgotten something it’s a problem


It creates lots of IDs, but that map is supposed to just be for the top level entities, and things you’ve provided your own temporary IDs to

Craig Brozefsky14:03:44

yah, so that seems wrong then

Craig Brozefsky14:03:58

I provided no temp IDs for anything

Craig Brozefsky14:03:14

So, I ran the same thing with in mem db

Craig Brozefsky14:03:16

lein test netgrok.core-test Preparing o1 "Elapsed time: 0.005525 msecs" Preparing o2 "Elapsed time: 8.82E-4 msecs" Importing o1 into asami Importing to asami "Elapsed time: 262.289293 msecs" Imported 4503 Importing o2 into asami Importing to asami "Elapsed time: 2622.369329 msecs" Imported 41484

Craig Brozefsky14:03:32

Is zuko involved in that too?


It’s a library that pulls entities apart into triples

Craig Brozefsky14:03:14

ok, so it's not in zuko then eh


BTW, the large number of entities in tempids was expected, but I’m revisiting it, and I think they should not be included.


So I’m going to update Zuko to remove them

Craig Brozefsky13:03:50

New times: lein test netgrok.core-test Importing o1 into asami Importing to asami "Elapsed time: 19520.553168 msecs" Imported 0 Importing o2 into asami Importing to asami "Elapsed time: 208748.131329 msecs" Imported 0

Craig Brozefsky13:03:04

So yah, I can confirm your estimate on perf win with Alpha6


I’m guessing that you’re saying “imported 0” to mean a count of tempids?

Craig Brozefsky13:03:03

yah, just ignored that this time since I havent' updated my tests yet -- still drinking first cup of coffee


Have a look at the count on tx-data. That’s the number of statements inserted


If you want the number of entities inserted… do a count on your input 😊


The tempids, is so you can provide a negative number for :db/id on an entity and it will generate an ID for you and tell you what your negative number got mapped to (like Datomic)

Craig Brozefsky13:03:17

I have some utility functions for exploring the shape of the data and the schema

Craig Brozefsky13:03:51

next step is to do some query clause generators

Craig Brozefsky13:03:22

for functional composition of where clauses...

Craig Brozefsky13:03:48

just doing export-data a bunch helped me grok what is happening


export-data gives you a view of everything, but if you insert individual things (or just small numbers of entities) then have a look at the contents of tx-data in the results of the transaction. That shows you the triples that were generated and inserted.


I’m curious how many triples you got from your data that took 3m28s to load. (I have to work to improve this)

Craig Brozefsky14:03:20

data coming up...

Craig Brozefsky14:03:50

ein test netgrok.core-test Importing o1 into asami Importing to asami "Elapsed time: 19489.224782 msecs" Imported 31881 statements Importing o2 into asami Importing to asami "Elapsed time: 209601.057757 msecs" Imported 296271 statements


Thanks for that

Craig Brozefsky14:03:02

heading out for brunch


For anyone wondering, Craig is allowed to push me around in here. He’s no longer at Cisco, but it was his bright idea that I write my own graph database.

Craig Brozefsky15:03:35

So I am coercing string keys in JSON to keywords... OUt of.. habit?

Craig Brozefsky15:03:51

Would I be violating any assumptions of Asami if I did not do that?


or… I hope not 🙂

Craig Brozefsky17:03:21

I need to check my undertanding here:

Craig Brozefsky17:03:38

[:tg/node-929806 "ip.flags" "0x00000040"]
 [:tg/node-623767 "layers" :tg/node-623768]
 [:tg/node-623767 :db/ident :tg/node-623767]
 [:tg/node-623767 :tg/entity true]

Craig Brozefsky17:03:58

netgrok.core> (d/entity (d/db (conn)) :tg/node-623767)

Craig Brozefsky17:03:05

I would not expect that to be an empty entity

Craig Brozefsky17:03:06

the triples are from: (d/export-data (d/db (conn)))

Craig Brozefsky17:03:15

this is asami alpha5 running in memory

Craig Brozefsky17:03:08

(conn) is just (d/db-connect URI) ...

Craig Brozefsky17:03:26

so I'm making a new connection using the DB uri, and a new DB...

Craig Brozefsky18:03:39

Ah, ok, if I don't coerce keys to keywords in the structs, entiy loading fails

Craig Brozefsky18:03:43

So I think that's a boog?

Craig Brozefsky18:03:38

this is in memory DB


I’m working with the cleaned up data right now, so the attributes are all keywords. I’ll try with the strings shortly


Oh, that’s interesting too

Craig Brozefsky18:03:25

yah, since the entity func is basically per storage...

Craig Brozefsky18:03:40

export-data is my new pal

Craig Brozefsky18:03:08

yah, so the file I sent you, I beleive has string keys


it does, yes

Craig Brozefsky18:03:47

:smiling_face_with_3_hearts: this is a pleasant way to explore data

Craig Brozefsky21:03:15

ok, gotten more familiar with the query language. Able to identify all the devices on my network, and start digging into their behavior

Craig Brozefsky22:03:53

The query language is impressie Paula

💖 3
Craig Brozefsky22:03:13

calling it a day tho, so I stop obsessing over it