Fork me on GitHub
#xtdb
<
2021-03-30
>
Jernej13:03:34

Hi guys, I am very new to Crux (and Clojure to be totally honest) and a little bit of help would come handy 🙏 I am storing data into Crux with latest Java api and for one of the field I am trying to store java array of strings (or collection, whatever is more optimal for Crux). We also need to be able to query for data stored in this array and I am not able to determine if this is best practice for Crux or not. Any help or opinion would be appreciated 🙂

refset14:03:46

Hi @U01S3SHTMKN, welcome to the party! You should have no problem with storing a collection of strings under a single attribute from the Java API. If you need to access the individual string values and join against them within a query then the supported indexed collection types are sets and vectors (i.e. clojure.lang.PersistentHashSet and clojure.lang.LazilyPersistentVector), however note that Datalog doesn't trivially allow you to access the stored ordering of elements in the vector (though it is possible using a custom predicate that calls the entity API for this information) Are you using Java directly? Or Kotlin / Scala?

Jernej14:03:27

Hey @U899JBRPF, thanks for blazingly fast answer 🙂 I am using Java directly yes. Basically what I am trying to achieve is: Example of entity :list [“item1”, “item”] And for query I would like to fetch all entities containing “item1” value in list

☺️ 3
Jernej12:03:24

@U899JBRPF Thanks for you help it works as described 🙂

refset12:03:40

Excellent, glad to hear it, I appreciate the update!

orestis17:03:10

Hey there, I got wooed by the fancy writing in the latest Crux blog post about Records and I'm thinking to give crux a spin to see if it can satisfy some of our requirements. We're currently using Mongo as our "source-of-truth" but we're already have a copy of most of our collections into Postgres to answer some analytical queries (where Mongo's performance was unacceptable) (and ElasticSearch for FTS, but that's out of scope for now). As expected, the dev team really wants to drop Mongo and go 100% Postgres. However, there's some issues like "audit logs", "history tables" etc etc that we haven't figured out just yet. So I thought, wait, if Crux already supports JDBC we could potentially spin up Crux side-by-side our existing JDBC infrastructure, ideally writing data to both places in the same transaction. In this case, we'd use "our" Postgres schema as the source of truth but we could use Crux for capturing the audit logs, and potentially other use cases such as previous versions of documents etc. Is this something that's even possible right now? Ie. can I control the JDBC transactions from my own code so I can commit a data changes to both stores at the same time? There's a ton of issues to be thought through but I'd like to know the fundamentals first.

refset17:03:17

Hey! 🙂 so one thing to bear in mind is that crux-sql only works for queries, so you still need to perform all writes to Crux through the submit-tx API. Another aspect is that Crux's SQL is a based on Apache Calcite and may not be Postgres-compatible enough to avoid duplicating all your queries - I don't have a list of differences to hand but I'm sure one of the many other Calcite-based platforms has it documented https://calcite.apache.org/docs/powered_by.html Ultimately though I think what you're proposing sounds worth exploring. I'd be keen to see some examples of the more complex SQL queries you need to run

orestis19:03:21

Wait, I think you mean about the SQL layer of crux, I was more talking about using the JDBC driver as the log/document storage.

orestis19:03:23

It's a clunky approach as I'd have to maintain two different data stores but it might have interesting properties...

refset19:03:14

Ohhh crux-jdbc, right, sorry...yep that makes sense also 🙂 You might struggle to fully align the writes between the systems without somehow intercepting or forking the transaction behaviours/commands that submit-tx generates. I can imagine that writing to both systems in a single transaction would be really neat though. What sort of time frame are you looking at? We could potentially spend some time thinking about feasibility at our end in the near future

orestis21:03:46

This is all 100% speculative from my end! No immediate plans to look into this for a 2-3 months at least

Steven Deobald16:03:06

I've built a system that does double writes like this in the past. It was SQL Server + Oracle, but the idea is the same. If you're spiking it out, it might be easier/cheaper to embed the call to submit-tx inside the Postgres transaction, since your Postgres code will (probably) be synchronous and roll back completely from wherever you throw an exception.

Steven Deobald16:03:18

It's also likely this makes more sense in terms of semantics. If raw Postgres is still your primary store and Crux is — initially, anyway — just for audits, then writes to Crux are secondary and should be within the scope of the primary store's transactions rather than the other way around.

seancorfield17:03:33

Back when we were investigating MongoDB as an addition or alternative to MySQL, we routed all our writes through a wrapper we wrote so our code mostly stayed exactly the same but everything was duplicated between MongoDB and MySQL. You have to be disciplined about it and there were certain operations that weren’t mapped 100% so sometimes MongoDB leaked up a layer. Overall, that approach worked well for us.

seancorfield17:03:16

(I should mention that we ultimately gave up on MongoDB and ended up migrating some new collections, that had not been double-written, back to MySQL! 🙂 )

orestis17:03:13

I like the wrapper for writes approach @U04V70XH6 . We are currently working with two systems and two databases and we’ve accepted that a few seconds of latency between stores is ok. So we sync changes every minute and in some cases we sync immediately after a write.