This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-04-14
Channels
- # aleph (2)
- # announcements (11)
- # aws (4)
- # babashka (42)
- # babashka-sci-dev (81)
- # beginners (90)
- # biff (2)
- # calva (40)
- # cider (16)
- # clj-kondo (26)
- # clj-on-windows (1)
- # cljdoc (4)
- # cljfx (1)
- # cljsrn (2)
- # clojure (92)
- # clojure-austin (2)
- # clojure-europe (23)
- # clojure-nl (5)
- # clojure-uk (3)
- # clojured (3)
- # clojurescript (19)
- # community-development (3)
- # conjure (1)
- # cursive (4)
- # datalevin (3)
- # datomic (5)
- # emacs (13)
- # events (1)
- # fulcro (26)
- # graphql (1)
- # hugsql (15)
- # introduce-yourself (5)
- # leiningen (1)
- # lsp (29)
- # minecraft (19)
- # music (1)
- # off-topic (36)
- # pathom (12)
- # podcasts (2)
- # portal (8)
- # re-frame (12)
- # reagent (11)
- # rewrite-clj (4)
- # shadow-cljs (56)
- # spacemacs (2)
- # vim (12)
- # windows (3)
- # xtdb (43)
out of curiosity, what would be the best way to store a latitude/longitude point (two numbers basically) to get a "things within radius of input point" type of query fast? any examples of GIS data with XTDB
I'm guessing just having the two numbers as separate attributes would be good enough unless you have lots of entities
Yeah, store separately, get all points within a bounding box, and then calculate actual R for only points in the bbox
For a lot of points it would start making sense to split the query into smaller bounding boxes, so that for everything definitely inside the radius would get included without calculating the R, and you would just check the bboxes on the boundary more carefully
My usecase was to find train/bus stops around a 1-15km radius. I ended up storing 3 fields: lat
, lon
and geohash
(which I pre-calculated using factual/geo
) When doing the search, I first compute (geohashes-around lat lon max-meters)
, then use them to query all the bus/train stops which have one of those geohashes. Then finally i filter only (in plain clojure, outside xtdb query) those found results and leave only whose (_spatial_/distance
is withing the radius I'm searching in. All this could definitely be improved, but it worked well enough for me on a dataset of approx 2789 train stops & 34973 bus stops. With mentioned approach, my api endpoint executes in around 60ms on a digitalocean potato server.

No idea, I was stupid enough not to try the much simpler variant with bounding boxes you mention. I'm not really sure why I haven't come up with that 😄
I just quickly tried querying via those lat/lon fields and the query is suprisingly slow. Even if i define the tiny bounding box (a single point matching just a single train stop, ie (`min-lat = max-lat` and, min-lon = max-lon
), the query executes in no less than 800ms and more. In comparison, lookup of the same train stop via geohash value takes 1.2ms.
I may have some issue in it the query.. may look into it later in the evening.
Basically I just added comparisons like these. Perhaps there's more efficient way.
[e :lat lat]
[e :lon lon]
[(<= min-lat lat)]
[(>= max-lat lat)]
[(<= min-lon lon)]
[(>= max-lon lon)]
If it doesn't have an implementation in https://github.com/xtdb/xtdb/blob/master/core/src/xtdb/codec.clj#L211, then I think it won't get indexed
> If it doesn't have an implementation in https://github.com/xtdb/xtdb/blob/master/core/src/xtdb/codec.clj#L211, then I think it won't get indexed
~small byte-array
values should actually get round-tripped (via Nippy) in a way that coherently sorts within the indexes for the range constraints to work correctly https://github.com/xtdb/xtdb/blob/2460cc8844e7b020c9c34f3d7a684fa4ea713d7e/core/src/xtdb/codec.clj#L362-L375
e.g. see https://gist.github.com/refset/f737151c2956093bbe01d7b81a743328
@U8ZQ1J1RR you nailed it. Passing those min-lat
(etc) values directly instead of via :in
brought the times to 1-2ms as well.
the lat
/ lon
are just regular numbers (ie Double)
The slowdown just because of passing those values via :in
in quite unexpectable to me. Does it have some good reason?
@U01LFP3LA6P were you passing the values :in
as scalars? i.e. :in [min-lat max-lat ...]
or are they wrapped in tuples/relations? Are you using -beta2
or -beta3
? Or something older? We added a change not too long ago that should calculate better join orders when scalars are passed in: https://github.com/xtdb/xtdb/commit/6245d76541847af969d86d6b63f0892e40642eab (also see the linked issue for discussion on the background / fundamental engineering problem)
scalars, 1.20.0
ah, excellent, thanks for confirming that ☺️ we will get 1.21
released properly soon, but in the meantime I would recommend using the beta during any new development anyway
@U01LFP3LA6P @U8ZQ1J1RR As always, this is a very interesting discussion that I'm willing to bet someone (a future "someone") would appreciate reading on http://discuss.xtdb.com if either of you are in the mood to summarize some time. 🙂
@U01LFP3LA6P what do you mean by passing in lat/long values “directly” instead of via :in
?
> what do you mean by passing in lat/long values “directly” instead of via :in
?
@petrus i.e. as literals in the compiled query, rather than using logic vars, like [?e :att "val1"]
vs [?e :att ?val]
+ :in [?val]
(there is some small overhead in this approach, as it essentially bypasses the query compilation caching)
you would want a separate geospatial index, have you seen https://github.com/teknql/crux-geo
Hey guys, I am preparing a technical talk on XTDB at my company (a conservative fintech) next week. I’m pitching as “version control for data.” Any previous experience on “selling XTDB” into a traditional bank-like organization and running it in production? The first pushback I got from cloud infra was about where to store the RocksDB indices if K8s pods have ephemeral storage.
Try saying a word "postgres", "mysql" or "kafka". These could work.
@petrus We'd be quite happy to help you out with this, I think. Care to drop us a line at <mailto:[email protected]|[email protected]>? Maybe we can have a call with you to talk through what you're trying to build and where/how xtdb is appropriate, providing some context and materials we have available. It's Good Friday in the UK, Sweden, and Canada tomorrow... so I'm afraid we're probably all OoO for that. But assuming your presentation is after Tuesday, we should be around to set something up. === Others: In general, if you want help constructing an argument to the powers-that-be at your company/org that they should allow you to give xtdb a spin, we're generally happy to (try to) help out with that. 🙂 More xtdb nodes in production is always a nice thing.
@steven427 📨 mailed. Talk has to be ready for dry run by Wednesday, so not much time. Actual talk on Thursday.

The question of how to run XTDB indexes in kubernetes are of interest to us too, though at this point (how much data we have, how much complexity persistent storage would bring) we still do it in-mem.
How do you deal with rebuilding in-memory indices when nodes are restarted? My understanding is that indices are built eagerly, which means if an index node is restarted, this would be a serious cold cache problem and put load on the tx log.
We just do it cold for now, since we don't scale dynamically yet. When our instances become shorter lived, we would use https://docs.xtdb.com/administration/1.20.0/checkpointing/
> how to run XTDB indexes [..] we still do it in-mem @U8ZQ1J1RR I would recommend switching to RocksDB or LMDB sooner rather than later - definitely ahead of going into production - because the performance will be more realistic (often faster...which is counter-intuitive 🙂) and you will be more sure to avoid OOM errors
Sorry, I meant we run rocksdb with the inmem non persistent filesystem in kubernetes
@petrus We'd be quite happy to help you out with this, I think. Care to drop us a line at <mailto:[email protected]|[email protected]>? Maybe we can have a call with you to talk through what you're trying to build and where/how xtdb is appropriate, providing some context and materials we have available. It's Good Friday in the UK, Sweden, and Canada tomorrow... so I'm afraid we're probably all OoO for that. But assuming your presentation is after Tuesday, we should be around to set something up. === Others: In general, if you want help constructing an argument to the powers-that-be at your company/org that they should allow you to give xtdb a spin, we're generally happy to (try to) help out with that. 🙂 More xtdb nodes in production is always a nice thing.
@steven427 📨 mailed. Talk has to be ready for dry run by Wednesday, so not much time. Actual talk on Thursday.

How do you deal with rebuilding in-memory indices when nodes are restarted? My understanding is that indices are built eagerly, which means if an index node is restarted, this would be a serious cold cache problem and put load on the tx log.