Fork me on GitHub
#xtdb
<
2021-08-19
>
jarohen11:08:09

Afternoon all - 1.18.1's out! Check out the https://github.com/juxt/crux/releases/tag/1.18.1 for more details. After the major Lucene changes in 1.18.0, we've followed it up with a few performance and bug fixes, but also the addition of :refresh-frequency (similar to ES's refresh_interval setting) which'll allow you to control how often Lucene's search indices refresh (useful for bulk ingests, for example). The community response to 1.18.0's secondary indices functionality gave us all a pleasant surprise! We've added a :with-tx-ops? flag here to save some of the common translation between indices, similar to the flag of the same name on open-tx-log. As always, a big thanks to everyone in the Crux community helping us by raising issues, submitting PRs and helping with repros! 🙏 James

🚀 12
🎉 6
crux 6
dorab18:08:34

Crux/datalog newbie. Can someone suggest improvements to the query below? In particular, looking to see if a subquery would work better or there was a way to do distinct on a tuple of logic vars. The entities are zip (postal) codes. All the attributes for each code are strings. There may be multiple codes per city-state_id pair. I want to sum all the population numbers by distinct city-state_id pairs. Thanks.

(defn city-states
  "Extract the city-state-pop information from the `crux-node`."
  [crux-node]
  (crux/q (crux/db crux-node)
          '{:find [c s state_name (sum popul)]
            :in [[[c s]]]
            :where [[e :zip/city c]
                    [e :zip/state_id s]
                    [e :zip/state_name state_name]
                    [e :zip/population population]
                    [(user/parse-long population) popul]
                    ]}
          (into [] (distinct)
                (crux/q (crux/db crux-node) '{:find [ city state_id ]
                                              :where [[e :zip/city city]
                                                      [e :zip/state_id state_id]]
                                              })) ))

refset21:08:28

Hi @U0AT6MBUL the result of crux/q should already be a set, so I don't think the (into [] (distinct) ... is serving a real purpose here I suspect you could fold that query in using a subquery, roughly like this:

(defn city-states
  "Extract the city-state-pop information from the `crux-node`."
  [crux-node]
  (crux/q (crux/db crux-node)
          '{:find [c s state_name (sum popul)]
            :where [[(q {:find [ city state_id ]
                         :where [[e :zip/city city]
                                 [e :zip/state_id state_id]]}) [[c s]]]
                    [e :zip/city c]
                    [e :zip/state_id s]
                    [e :zip/state_name state_name]
                    [e :zip/population population]
                    [(user/parse-long population) popul]
                    ]}))

refset21:08:20

although really I think you should be able to do it all at the same query level, since the join algorithm naturally reduces all the intermediate logic variable results to sets:

(defn city-states
  "Extract the city-state-pop information from the `crux-node`."
  [crux-node]
  (crux/q (crux/db crux-node)
          '{:find [c s state_name (sum popul)]
            :where [[e0 :zip/city c]
                    [e0 :zip/state_id s]
                    [e :zip/city c]
                    [e :zip/state_id s]
                    [e :zip/state_name state_name]
                    [e :zip/population population]
                    [(user/parse-long population) popul]
                    ]}))

dorab23:08:44

Thank you very much @U899JBRPF. Duh. I totally forgot that the result was a SET and thus no need for distinct. One more question... how do I sort the result by decreasing (sum popul)?

🙏 3
dorab23:08:01

I tried :order-by [[(sum popul) :desc]] and, surprisingly to me, that worked.

🙂 2
refset07:08:30

Ha, I don't think I'd have guessed that that order-by + aggregate worked, but I'm glad it does!