This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
this behavior is concerning: map keys are renamed and converted to keywords?
(env/tx env [[:put-docs :resources {:xt/id 1 :foo {"MY_KEY" 1}}]])
(env/q env '(from :resources [{:xt/id 1} *])) ;; => [{:xt/id 1, :foo {:my-key 1}}]
I see the keys are case insensitive, so that
{:MY_KEY 1
:MY_key 1}
is an illegal value.. this seems far too opinionated?I see I'm a year late to the discussion (https://github.com/xtdb/xtdb/issues/2481) but I don't see recursive normalization justified, and the workaround of explicitly specifying case in the query doesn't apply to nested values either right?
I can live with columns being mangled (though have you looked at how ORMs do this? the datalog layer seems like a clojure ORM) but, strongly disagree on ever mangling values. I don't see how that helps SQL compatibility etiher
thanks for the thoughts and for sharing the examples - we will review and respond properly soon 🙏
> strongly disagree on ever mangling values are you counting the columns within nested structures as 'values' also here?
> though have you looked at how ORMs do this? any you would recommend comparing with in particular?
I can see there's a dichotomy of considering xtdb, or any db, to be storing data in it's own schema, or model, and storing Clojure values. So if you expect that XTDB has columns with it's own way of defining what that column is, but the data inside it is a Clojure type, then the value is a Clojure map, where changing keys is definitely changing the data. But if you think that the DB has a data model that has some kind of mapping to and from Clojure data structures, then it's more expected that the nested layers behave exactly as the top level.
I think it's understandable that when you can put in data as clojure maps, and you get them back as clojure maps, you expect them to be exactly the same, even though there's actually some incomplete de/serialization happening
I considered comparing to JSON, but then again there's transit+json without the same limitations…
> are you counting the columns within nested structures as 'values' also here? definitely. it's like storing jsonb in postgres right? that's case sensitive. I assume xt is planning on doing nested indexes eventually if this is even a question? I think you'd really want a schema/user control over indexing in that case
> it's like storing jsonb in postgres right?
It's not really, tbh, it's more fundamental than that - it's closer to adding maps and vectors as nested columns to SQL/RDBMSs themselves. SQL itself is, broadly speaking, case insensitive - if you create columns in one case and query them using another, you'll still get them back. Postgres, for example, will down-case any column names in your query. The SQL spec talks of 'case normal form' (which is also lossy) - we've done something similar but chosen a different normal form to better suit our use cases.
(the exception to the rule here is 'delimited identifiers' - if you enclose table/column names in double quotes it'll preserve the case. openly, I'm not sure how often these are used - relatively little if at all from what I've personally seen, but I'm happy to take data points if others make more use of them)
We're subject to a number of constraints here, as you've linked above. Of particular influence here are the constraints to have data inserted through Clojure being idiomatically accessible from outside (and vice versa), and also our usage of Apache Arrow under the surface. These significantly shaped the solution - we're acutely aware that a different take on the constraints would have yielded a very different solution. I'd be interested to hear others' opinions on this - if you think we've missed constraints, or over-/under-valued them, please do let us know 🙂
FWIW you can also specify a custom :key-fn
to the query, which will override the default key normalisation in your results. For people using consistent casing within any given query (even if it's not what was put in), this should be sufficient - but again, let us know if you'd benefit from this being more powerful 🙂
how would I do jsonb in xtdb? If I have keys in different cases like http headers in kebab-case
, and other stuff in snake_case
, how do I migrate to xtdb?
or xt1, for that matter. I guess we'd need to explicitly nippy/freeze stuff and extend the deserializer to understand nippy?
personally in other uses I found it simpler to just double quote all identifiers as that matches the semantics of every programming language, though that does make it a bit inconvenient for doing raw queries (until you build tooling around it). I ran into one issue with external tooling but that was seen as a bug and fixed
:key-fn
is not powerful enough for me, it would have to be more targeted like a specter navigator, but I think the right solution is probably a custom map-like data type that xt won't mangle
which, while tractable is a decently sized ask from someone who just wants to store a map
I am also still ignorant about the benefit of being this fancy vs just freezing the whole value like in xt1. it's not like we can add indexes to these nested columns (yet?)
if nested indexes are in the picture I'd probably prefer an API like`(with-meta {:xt/id 1 :value {:a 1}} {:index [:value :a]})` rather than aggressively normalizing/mangling/indexing everything recursively. or even put the metadata on the value map directly {:xt/id 1 :value (with-meta {:a 1} {:index :a})}
it's mainly for the cross-play - the ability for Clojure (incl. XT1) users to work with idiomatic kebab-case keywords, and SQL/Java/generally non-Clojure users to work with whatever's idiomatic in their own language, respecting the SQL case-insensitivity defaults, on the same data
we wanted it to Just Work™ for the majority of idiomatic identifiers, with clear rules for the remainder
I see, so even if you can't join on those identifiers you'd prefer to not have to quote anything in the SQL statement
sorry, even if you can't query* on those identifiers, i.e. they're not indexed columns.
ah, nw 🙂 yep, they are indexed columns. at the moment we store basic (page-level) metadata on them, eventually we'd like to offer user-specified secondary indexes on nested columns too
the best solution that's come to mind is a user-provided schema/serde layer, using malli or something, that works before/after xtdb does its own serialization
mm :thinking_face: delimited identifiers does this job in SQL - the "just do exactly what I say" - we don't have a parallel to this atm in the Clojure side (either for transacting docs or queries)
I tried using symbols as keys to denote that, but turns out those aren't supported at all 🙂
in the meantime I have to look at
(defn q [env & args]
(->> (apply xt/q (::xtdb env) args)
(eduction (map #(reduce-kv (fn [m k v] (assoc m k (((::key->vf env) k identity) v))) {} %)))))
hey @U797MAJ8M - following this thread earlier in the week we've started to consider escape hatches to ensure that we honour the casing/punctuation if user's really need it. SQL already has the concept of delimited identifiers for this purpose, so https://github.com/xtdb/xtdb/issues/3350 seems a good first step in that direction - we'll need some more consideration on how to reflect this in Clojure
my serialization approach also solves https://github.com/xtdb/xtdb/issues/3341 for me so I'll stick with it