Fork me on GitHub
#xtdb
<
2023-02-26
>
tatut07:02:34

having some trouble with big(ish) data, we have some reports that have a large number of rows and we have modeled it currently as report and each row being separate documents and report referring to rows like {:rows [r1 r2 …]} , trouble is that there will be relatively few reports (in the tens of thousands range) but quite a few rows (100+ million in production) and this model seems troublesome

tatut07:02:58

the thing is that while the reports are something that we want to query in datalog, but the individual rows not so much, I’m thinking should I just batch the rows into a serialized map

tatut07:02:49

one huge report might have 100k or more rows even, so that is a big thing to even transact in a single tx

refset17:02:12

you could consider chunking the rows into modelled batches, e.g.

{:xt/id :my-report
 :row-data [:my-report-rows-batch-1
            :my-report-rows-batch-2
            ...]}

{:xt/id :my-report-rows-batch-1
 :rows-outer {:rows [:foo :bar :baz ...]}}
something like this should limit the bloat in the indexes

refset17:02:21

you could add a batch-count also to regain transactional integrity (to know whether all batches have been written as intended)

tatut17:02:07

so using {:rows [...]} as value would plain serialize them... that is probably ok for my case

1
tatut17:02:10

thanks for the suggestion

🙏 1
Dustin Getz16:02:21

is it true that xt/listen is only available when "embedding XTDB within the JVM application"

Dustin Getz16:02:55

Is it possible to run XTDB in a production configuration and connect to it from the clojure API with xt/listen?

refset17:02:48

> is it true that xt/listen is only available when "embedding XTDB within the JVM application" yes, but you can also poll the tx-log endpoint over HTTP https://docs.xtdb.com/clients/http/#tx-log

👍 1
refset17:02:48

> Is it possible to run XTDB in a production configuration and connect to it from the clojure API with xt/listen? yes, simply embedding an XT node (but not the tx-log) is considered production configuration

👍 2
Petrus Theron08:04:34

@U899JBRPF starting from https://github.com/hyperfiddle/electric-xtdb-starter, practically how should my XT configuration change if I want to deploy and scale my application on http://Fly.io? I assume I need a “master” XTDB node to which my scaled up edge nodes will connect to over HTTP and poll /tx-log for changes? Should each edge node then be a carbon copy of master, or something else? Can I run an in-process XT node on each edge deployment and let XTDB worry about getting and staying up to speed on master? Thanks! (may I suggest considering adding a section to XT docs for “Deployment”, “Production” and/or Replication)

refset12:04:38

Hey @U051SPP9Z XT nodes are essentially all ~identical and deterministic replicas (with eventually consistent processing of the tx-log). To scale up you will need to use a remote tx-log and doc-store for >1 XTDB node to connect to (e.g. Kafka or a JDBC backend) - currently it is configured to use an embedded KV store which can't be access by more than 1 node https://github.com/hyperfiddle/electric-xtdb-starter/blob/ffe3ed23cc51e7dd7a001263b52d52a6fd00738a/src/user.clj#L16-L17 I haven't looked at the specifics of http://Fly.io scaling or deeply thought about whether/how (distributed) Electric intersects with clustering of XTDB node, but hopefully that information gives you some intuition.

🙏 2
wei23:08:55

resurrecting this thread, I'm using the electric-xtdb-starter with embedded KV store and managed to compile and deploy it to http://fly.io. I'm seeing the following runtime error:

[info] ERROR hyperfiddle.electric: #error {
[info] :cause xtdb.api.PXtdb
[info] :via
[info] [{:type java.lang.NoClassDefFoundError
[info] :message xtdb/api/PXtdb
[info] :at [app.xtdb_contrib$latest_db_GT_ invokeStatic xtdb_contrib.clj 12]}
[info] {:type java.lang.ClassNotFoundException
[info] :message xtdb.api.PXtdb
[info] :at [jdk.internal.loader.BuiltinClassLoader loadClass BuiltinClassLoader.java 581]}]
any hints to what PXtb does and how I might resolve this error? here's the line causing the exception: https://github.com/hyperfiddle/electric-xtdb-starter/blob/master/src/app/xtdb_contrib.clj#L12C13-L12C13

refset14:09:23

Hi @U066TMAKS - is the !xtdb var definitely bound? have you observed that XT's start-node API ever gets called following https://github.com/hyperfiddle/electric-xtdb-starter/blob/1bfc0255997ab6ace19b45536c70946500d48567/src/user.clj#L31 ?

wei00:09:53

trying to compile and run the uberjar locally, I'm actually getting the same java.lang.NoClassDefFoundError but for a different class, which makes me think it's an uberjar build issue. I found some old threads pointing to AOT as a possible culprit, but I don't think we're using AOT compilation here.

WARN  org.eclipse.jetty.websocket.common.WebSocketSession: Exception while notifying onClose
java.lang.NoClassDefFoundError: clojure/tools/logging/impl/LoggerFactory
	at hyperfiddle.electric_jetty_adapter$electric_ws_adapter$on_close__14649.invoke(electric_jetty_adapter.clj:68)
	at ring.adapter.jetty9.websocket$proxy_ws_adapter$fn__14515.invoke(websocket.clj:159)
	at ring.adapter.jetty9.websocket.proxy$org.eclipse.jetty.websocket.api.WebSocketAdapter$WebSocketPingPongListener$12d400b6.onWebSocketClose(Unknown Source)
	at org.eclipse.jetty.websocket.common.events.JettyListenerEventDriver.onClose(JettyListenerEventDriver.java:149)
	at org.eclipse.jetty.websocket.common.WebSocketSession.callApplicationOnClose(WebSocketSession.java:394)
	at org.eclipse.jetty.websocket.common.io.AbstractWebSocketConnection.close(AbstractWebSocketConnection.java:225)
	at org.eclipse.jetty.websocket.common.WebSocketSession.close(WebSocketSession.java:130)
	at org.eclipse.jetty.websocket.common.events.AbstractEventDriver.openSession(AbstractEventDriver.java:221)
	at org.eclipse.jetty.websocket.common.WebSocketSession.open(WebSocketSession.java:493)
	at org.eclipse.jetty.websocket.common.WebSocketSession.onOpened(WebSocketSession.java:459)
	at org.eclipse.jetty.io.AbstractConnection.onOpened(AbstractConnection.java:213)
	at org.eclipse.jetty.io.AbstractConnection.onOpen(AbstractConnection.java:205)
	at org.eclipse.jetty.io.AbstractEndPoint.upgrade(AbstractEndPoint.java:444)
	at org.eclipse.jetty.server.HttpConnection.onCompleted(HttpConnection.java:401)
	at org.eclipse.jetty.server.HttpChannel.onCompleted(HttpChannel.java:820)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:368)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:279)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)
	at org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:336)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:313)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:171)
	at org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.produce(EatWhatYouKill.java:135)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:882)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1036)
	at java.base/java.lang.Thread.run(Thread.java:833)
Caused by: java.lang.ClassNotFoundException: clojure.tools.logging.impl.LoggerFactory
	... 27 common frames omitted

Dustin Getz00:09:26

i replied in the #C7Q9GSHFV thread

👍 2
Dustin Getz00:09:08

it’s a subtle clojure issue that the electric xtdb starter doesn’t go out of its way to defend against

Dustin Getz00:09:13

as to whether it is technically an XT issue it might be debatable tbh i don’t understand the issue

2
Nikolas Pafitis23:03:05

@U899JBRPF How does this change (if it does) with XTDB v2

refset23:03:38

Hi 🙂 the v2 integration should probably look quite different - I would start afresh, looking at the current state of the art with electric<->Postgres integration and try to treat v2 similarly

dgb2318:02:08

Is this a typo in the docs? https://docs.xtdb.com/language-reference/datalog-transactions/#transaction-time The tx time is provided as the third arg of submit-tx

dgb2318:02:44

Sorry I just checked the changelog and it's a new feature in 1.21 apparently. I'm still using 1.20 in this session. The docs don't seem to reflect 100% that this feature is part of the signatures (see individual transactions such as put etc.) so I got confused.

refset18:02:47

I'll have a look at how to make this clearer. I guess having some sort of "New feature since 1.x" label in the docs somewhere would be a useful indicator too :thinking_face:

👍 1
refset18:02:17

Were you hoping to use it?

dgb2321:02:44

> Were you hoping to use it? Ill definitely upgrade to the newest version. the project I'm working on is basically in an experimentation stage. The reason I'm looking at XTDB (and Clojure) is because I wrote (and maintain) and application 2y ago with SQL and a conventional web framework that has a reporting system and a bi temporal data model. It works well and my clients are happy. I've been personally very happy with having a queryable audit trail of all the transactions. But implementing it was painful. And the web framework didn't help me at all outside of very trivial baseline stuff.

dgb2322:02:02

I'm currently trying to figure out how to have. a "transient" database in front of XTDB (for drafts/work in progress documents that update very quickly and frequently) so I was trying out stuff with valid time/tx time.

dgb2322:02:40

Very happy so far!

refset10:02:52

Ah that's all great to hear ☺️ Out of interest, do you make use of future / proactive valid-time operations (either currently or in your previous app)?

refset10:02:36

The 'transient' database aspect sounds quite intriguing. I guess in addition to performance concerns you're also hoping to avoid muddying the main database that contains agreed documents with quickly-irrelevant drafts. If you think there are ways XT could make this setup easier (cross-db joins, perhaps?) we would be keen to hear your thoughts :)

dgb2311:02:27

> proactive valid-time operations (either currently or in your previous app)?

dgb2311:02:59

in theory yes in the previous app. My client doesn't use it though 🙂 in the current prototype I likely actually need it though. I'm thinking of leveraging this in order to create a publishing workflow. But I'm not quite sure yet whether that's a good idea or not!

dgb2311:02:17

> The 'transient' database aspect sounds quite intriguing. I guess in addition to performance concerns you're also hoping to avoid muddying the main database that contains agreed documents with quickly-irrelevant drafts.

dgb2311:02:04

> If you think there are ways XT could make this setup easier (cross-db joins, perhaps?) we would be keen to hear your thoughts 🙂 I'm thinking of applying the as-of-now state to the transient database per document that has a draft, on request. I'm thinking of using with-tx for this, where only a single draft is applied for a given query. I'm not sure whether I really need actual joins over all the drafts (yet) and how I would achieve that properly. That's still in hammock stage. As for how could xt make this easier: if cross joins were possible it would be cool. But that has all kinds of implications, for example how are identities resolved or what role will tx/valid time play? For my particular use case it would be at least somewhat clear but I'm not sure whether that's a general enough case. the most convenient (for me) feature would be to have transient documents, that are / can be applied over currently valid ones in terms of both querying and transacting within the same db. but that's very specific to my little project and likely not a concern for xt as a whole 🙂

refset11:02:52

So...something like with-tx but higher-level, durable, and without inflating the main data :thinking_face: I guess the key bits all exist to make it happen in userland, at least

👍 2
dgb2311:02:39

Yes! Functionality wise everything is there. The only thing that I'm slightly worried about is filling up the tx log with stuff I don't really care about - for performance/resource reasons. I would have to have bang ! version of submit-txetc. But that's a purely theoretical worry. It's more important for the to have the same API. I have to experiment more and see what happens to say/ask more useful things!

blob_thumbs_up 2
wei23:08:55

resurrecting this thread, I'm using the electric-xtdb-starter with embedded KV store and managed to compile and deploy it to http://fly.io. I'm seeing the following runtime error:

[info] ERROR hyperfiddle.electric: #error {
[info] :cause xtdb.api.PXtdb
[info] :via
[info] [{:type java.lang.NoClassDefFoundError
[info] :message xtdb/api/PXtdb
[info] :at [app.xtdb_contrib$latest_db_GT_ invokeStatic xtdb_contrib.clj 12]}
[info] {:type java.lang.ClassNotFoundException
[info] :message xtdb.api.PXtdb
[info] :at [jdk.internal.loader.BuiltinClassLoader loadClass BuiltinClassLoader.java 581]}]
any hints to what PXtb does and how I might resolve this error? here's the line causing the exception: https://github.com/hyperfiddle/electric-xtdb-starter/blob/master/src/app/xtdb_contrib.clj#L12C13-L12C13