This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-03-07
Channels
- # announcements (12)
- # autochrome-github (4)
- # babashka (27)
- # babashka-sci-dev (2)
- # beginners (80)
- # calva (34)
- # cider (8)
- # clj-kondo (77)
- # clojars (34)
- # clojure (50)
- # clojure-europe (35)
- # clojure-nl (4)
- # clojure-uk (5)
- # clojured (2)
- # clojurescript (26)
- # core-async (4)
- # cursive (4)
- # datahike (4)
- # datomic (40)
- # emacs (7)
- # etaoin (1)
- # fulcro (12)
- # google-cloud (143)
- # hyperfiddle (1)
- # integrant (2)
- # jobs (1)
- # malli (15)
- # membrane (17)
- # off-topic (38)
- # pathom (1)
- # reagent (4)
- # releases (2)
- # remote-jobs (2)
- # sci (1)
- # shadow-cljs (10)
- # spacemacs (7)
- # tools-deps (5)
- # vim (6)
- # xtdb (37)
Hey all! I want to model role based access using 3 different types of documents in XTDB. This is how my structure looks at the moment which works fine:
;; organisation
{:type :organisation
:xtdb/id "org1-id"
:organisation/name "org1"}
;; user
{:type :user
:xtdb/id "member1-id"
:user/email ""}
;; role
{:type :role
:role/organisation "org1-id"
:role/user "member1-id"
:role/level :owner}
The role
structure looks very similar to the "intermediate" many-to-many table we use in relational db's.
I'm wondering if there is a better approach to model this? Perhaps a set of roles in user
like :user/roles #{role1, role2}
or something else? :thinking_face:Yes, I agree. Give roles to a user rather than users to a role, and allow users to have many roles.
Thanks!!! So if I understand this correctly, the suggestion here is to model it like this?
;; roles 1 and 2
{:type :role
:xtdb/id "role1-id"
:role/organisation "org1-id"
:role/level :owner}
{:type :role
:xtdb/id "role2-id"
:role/organisation "org2-id"
:role/level :guest}
;; user
{:type :user
:xtdb/id "member1-id"
:user/email ""
:user/roles #{"role1-id" "role2-id"}}
I am thinking this might make some queries more difficult like "List all users with their levels for org1"
Changing role for a user also now is a little bit confusing to me. Do I just update the role documents (update role1 :role/level :admin)
, or do I only update the references (remove the old role and add a new role reference)?
i.e. does each role be unique to a user instead of being reused across multiple users?
{:type :role
:xtdb/id "role1-user1-id"
:role/organisation "org1"
:role/level :member
:role/override-perms #{"share"}}
I guess I'm still thinking too strictly in how I've modeled data using relational tables, so trying to get some sense of intuition here! πI went with reifying the role concept into something more than a role (a member which contains a role). Realised I wanted to store other things as well to it and make it truly unique to a (user, org) pair. In this case it felt most intuitive to stick to my original implementation as it made most queries easier and I didn't find any gains by doing it the other way! Thanks for the help! π
Thank you for your write-up @U013MQC5YKD - it's really great for everyone to share these kinds of experiences, and there are always tonnes of designs and trade-offs to consider with Datalog-based authz
@U013MQC5YKD If you have further data modeling questions, it would be great to extend the conversation to http://discuss.xtdb.com β these come up pretty often (and there are a lot of small variations on similar ideas) and the forum might be a nice place to track ideas as a bit of light documentation that folks could read in the future (Clojurians does have a log, but it's not very google-friendly or easy to update with new content).
What about conditionally adding clauses? Sometimes I donβt want to filter by :entity/type, can I exclude this triplet on a logical condition?
(defn entity:list-help-offers
[node params]
(->> (xt/q
(xt/db node)
'[:find (pull e [*])
:in e-type e-aid
:where
[e :entity/type e-type]
[e :entity/kind :help-direction/offer]
[e :entity/area-id e-aid]]
(:entity/type params)
(:entity/area-id params))
(mapv first)))
upd: yes managed to
(defn entity:list-help-offers-2
[node params]
(let [e-type (:entity/type params)
where-block
(filterv identity
[(if e-type
'[e :entity/type e-type])
'[e :entity/kind :help-direction/offer]
'[e :entity/area-id e-aid]])]
(->> (xt/q
(xt/db node)
{:find '[(pull e [*])]
:in '[e-type e-aid]
:where where-block}
(:entity/type params)
(:entity/area-id params))
(mapv first))))
this is generally the best pattern. You can often try to use Datalog or
and rules to achieve a similar effect, but the performance implications aren't always obvious there
I have three models with a to-many relation: foo -> bar -> baz, and I'd like to count the number of baz related to a specific foo. Some foos may be missing the foo -> bar relation, the bar -> baz relation, or both, in which case I'd like to count that as zero. How could I write such a query?
I'm probably missing something here, but I'd guess like this:
{:find [(count ?baz)]
:in [?foo]
:where [[?foo :to-bar ?bar]
[?bar :to-baz ?baz]]}
I slightly misstated the problem. I'd like to get totals for each foo, not a specific one that gets parameterized. Results should be something like this:
{:xt/id 1
:type :foo}
{:xt/id 2
:type :foo}
{:xt/id 3
:type :bar
:foo 1}
{:xt/id 4
:type :bar
:foo 1}
{:xt/id 5
:type :baz
:bar 3}
:xt/id 1 :baz-count 1
:xt/id 2 :baz-count 0
My actually query is just a bit more complicated, aggregating items by year:
{:find '[?period ?foo-name (count ?baz) #_ ?baz-count]
:where ['[?baz ::baz/bar ?bar]
'[?baz ::baz/datetime ?date]
[(list (period->date-fn-var :year) '?date) '?period]
'[?bar ::bar/foo ?foo]
'[?foo ::foo/name ?foo-name]]
; '[(count ?baz) ?baz-count]]})
:order-by '[[?period :asc]]})
This almost gets me all the way. The results are something like:
["2012" "foo-1" 12]
["2013" "foo-1" 98]
["2013" "foo-2" 40]
["2014" "foo-2" 3]
But I haven't yet figured out how to zeros for years without relations.
Another thing I don't quite understand is why uncommenting the last :where
clause and using its result in the :find
instead of doing the aggregation in the :find
directly doesn't produce the same result. It seems like it gets stuck and isn't able to complete when I use it that way.Hey @UPWHQK562 I think this works:
(let [n (xt/start-node {})]
(->> [{:xt/id 1
:type :foo}
{:xt/id 2
:type :foo}
{:xt/id 3
:type :bar
:foo 1}
{:xt/id 4
:type :bar
:foo 1}
{:xt/id 5
:type :baz
:bar 3}
{:xt/id 6
:type :baz
:bar 3}]
(map (fn [doc] [::xt/put doc]))
(xt/submit-tx n)
(xt/await-tx n))
(xt/q (xt/db n) '{:find [e (sum c)]
:where [[e :type :foo]
(or-join [e c e3]
(and [e2 :type :bar]
[e2 :foo e]
[e3 :type :baz]
[e3 :bar e2]
[(identity 1) c])
(and [(identity nil) e3]
[(identity 0) c]))]}))
;;=> #{[2 0] [1 2]}
> Another thing I don't quite understand is why uncommenting the last :where
clause [...]
It looks like you are hoping to be able to do an aggregate in the middle of the query, which won't work as you might expect (instead I think ?baz-count
will always return 1
). If you really want to do such a thing though, you can use subqueries for that
@U899JBRPF thanks very much! I'll try that shortly. I worked on it a bit longer yesterday and also tried to use the same trick with (identity 1)
and zero, but couldn't quite make it do the thing. I used or
instead of or-join
and had to put in a bunch of [(any? x)]
noops to make it do anything at all.
> and had to put in a bunch of [(any? x)]
noops
ah, I should mention I'm using the 1.21 beta release which is a lot smarter here, and ~eliminates the need for this
Ah, okay, glad I brought it up π Looking forward to that release then. I'll try applying your solution later and will give ya a :thumbsup: reaction if it works.
Not sure, but maybe something like this should either in docs or the query-test
ns? Then again, I searched in here and didn't find anyone asking a similar question. Maybe it's not so common of a requirement.
Thanks again @U899JBRPF.
True, this is definitely a concise problem and example! Fwiw, I don't recall ever solving exactly this before :thinking_face:
Hi @U899JBRPF. The query produces the desired result, but the only problem is that on my data set/local machine, adding in the zero values manually AFTER running the query is about 3x faster than doing it all in the query. Is that something you would expect?
Doing non-IO work inside the query engine is often slower than just using raw Clojure due to the interpretive execution and hashing overheads involved in how the algorithm works
Which part of that is considered non-IO? It's just aggregations by the looks of it. Granted, I am doing some date truncation in my actual query, but that's not the part that I'm replacing with raw CLJ.
On reflection, I think it's possible to rewrite the Datalog to execute much more efficiently using get-attr
:thinking_face: I'll have a play with it in a bit
> Which part of that is considered non-IO? for instance, this leg creates a tonne of unnecessary subqueries and isn't doing anything with the data in the database:
(and [(identity nil) e3]
[(identity 0) c]))
(but it's necessary in the current structure for the join to work as a kind of left-join)I was thinking about get-attr with a default value but couldn't summon the right incantations just yet π
On yet further reflection, get-attr isn't much use since it operates in the wrong direction, but you can do this:
(let [n (xt/start-node {})]
(defn get-bar-baz-count [db e]
(or (ffirst (xt/q db
'{:find [(count e3)]
:in [e]
:where [[e2 :type :bar]
[e2 :foo e]
[e3 :type :baz]
[e3 :bar e2]]}
e))
0))
(->> [{:xt/id 1
:type :foo}
{:xt/id 2
:type :foo}
{:xt/id 3
:type :bar
:foo 1}
{:xt/id 4
:type :bar
:foo 1}
{:xt/id 5
:type :baz
:bar 3}
{:xt/id 6
:type :baz
:bar 3}]
(map (fn [doc] [::xt/put doc]))
(xt/submit-tx n)
(xt/await-tx n))
(xt/q (xt/db n) '{:find [e (sum c)]
:where [[e :type :foo]
[(dev/get-bar-baz-count $ e) c]]}))
that may still be possible to optimize further by converting [e2 :type :bar]
to [(get-attr e2 :type)[e2type]][(= e2type :bar)]
and similarly for :baz
(depending on what the join order stats look at like at scale, with the various relative cardinalities & selectivities)
Thanks @U899JBRPF. I'll give that a try soon and report back on how it impacts performance with my data set. Appreciate your help!
Hi @U899JBRPF. I got around to trying your latest recommendation (without the added optimizations). My version is a bit different, because each baz
has a datetime field and I want to group things by year count. The query I came up with works to get the correct result:
(defn get-foo-year-count
[db foo given-year]
(doto
(or
(ffirst
(xt/q db
'{:find [(count found-years)]
:in [foo given-year]
:where [[bar :foo foo]
[baz :bar bar]
[baz :datetime x]
[(subs x 0 4) found-years]
[(== given-year found-years)]]}
foo given-year))
0)
tap>))
(xt/q (xt/db n)
'{:find [foo y c]
:where [[foo :type :foo]
;; get all the unique years
[(q {:find [(distinct year)]
:where [[baz :datetime year+]
[(subs year+ 0 4) year]]})
[[all-years]]]
[(q {:find [foo year c]
:in [foo [year ...]]
:where [[(dev/get-foo-year-count $ foo year) c]]}
foo all-years)
[[f y c]]]]
:order-by [[foo :asc]
[y :asc]]})
Result with my data:
[[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2009" 84]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2010" 271]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2011" 971]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2012" 1277]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2013" 986]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2014" 1988]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2015" 1699]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2016" 2272]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2017" 2452]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2018" 2209]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2019" 2776]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2020" 3460]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2021" 3316]
[#uuid "ffffffff-ffff-ffff-ffff-000000000001" "2022" 362]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2009" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2010" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2011" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2012" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2013" 41]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2014" 3930]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2015" 7887]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2016" 2680]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2017" 3355]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2018" 4291]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2019" 3221]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2020" 3444]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2021" 3144]
[#uuid "ffffffff-ffff-ffff-ffff-000000000002" "2022" 402]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2009" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2010" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2011" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2012" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2013" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2014" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2015" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2016" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2017" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2018" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2019" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2020" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2021" 1530]
[#uuid "ffffffff-ffff-ffff-ffff-000000000003" "2022" 437]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2009" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2010" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2011" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2012" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2013" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2014" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2015" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2016" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2017" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2018" 59]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2019" 466]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2020" 284]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2021" 563]
[#uuid "ffffffff-ffff-ffff-ffff-000000000004" "2022" 119]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2009" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2010" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2011" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2012" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2013" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2014" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2015" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2016" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2017" 0]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2018" 3800]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2019" 5282]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2020" 5593]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2021" 13803]
[#uuid "ffffffff-ffff-ffff-ffff-000000000005" "2022" 7008]]
But it's quite slow. The above takes ~35s on my machine. Do you happen to see any obvious optimizations that would make a big difference? I could, of course, go a completely different route like pre-computing these queries on a schedule (they don't have to be very recent), or adding in year fields that are previously extracted, or some other option... but I am wondering if I can change the query to boost performance significantly. In production, the data will be much larger, so that's a consideration.I do have some ideas to ensure that nothing strictly unnecessary is happening...but first we need to take a peak at the join order - can you share the result of running (:vars-in-join-order (xtdb.query/query-plan-for (xt/db n) <THE-BIG-QUERY>))
?
In general though, it looks like caching or otherwise pre-computing the counting should speed things up dramatically. XT provides hooks to build secondary indexes (e.g. see the implementation of xtdb-lucene) and also transaction functions for helping with this kind of thing.
Scheduling the query sounds like the easiest thing to make fast progress here though, unless all your queries are highly analytical/OLAP like this also? If that's the case then it might be best to have a short call to discuss the options in depth (feel free to email me / send an invite <mailto:[email protected]|[email protected]>)
@U899JBRPF sorry for the delayed response!
Here's the join order: [all-years foo f total y _133063]
. I think part of the reason for the ~35s query time is what you explained here https://github.com/xtdb/xtdb/issues/1674.
I do have quite a few queries like this and more planned, but I suppose for now I'll go with the quickest and dirtiest solution while I look into using better approaches for the long term. Do you have some sense over whether transaction functions or secondary indexes would be a better place to start?
Not a problem, async responses work great for me π
That join order looks sane enough to me - I mainly wanted to sanity check that all-years
and foo
were the first two in the vector
> Do you have some sense over whether transaction functions or secondary indexes would be a better place to start?
I'd say transaction functions will be faster to get working, although they may turn out to be a bottleneck for writes, but maybe not - it depends on whether you need to do fresh counting work each time, or whether you can inc
/`dec` based only on the small amount of data already at hand in the transaction context
@U899JBRPF I can get away with incremental updates. Okay, thank you so much for all the feedback here. I have enough to work with and have gained some insights for sure. Till next time!