Fork me on GitHub
#datomic
<
2024-04-07
>
jjttjj18:04:47

I'm struggling to get import-cloud working with anything besides a top level :since filter set to an extremely recent time. Outside of that I just get Importing printed and then nothing happening until sometimes an eventual timeout error. I have two attributes I need. ::attr-a has a few hundred new datoms transacted every minute and I only need this most recent data ::attr-b had a bunch of data inserted about a week ago. Duplicate data gets periodically re-transacted, but I think the datoms are still marked with the :t at which they were originally asserted, so I need the filter to include that earliest time. Overall my entire database isn't huge, I'm not sure if this note from the docstring: > import-cloud limits the total number of datoms imported in a transaction to 16 million and limits strings to 1 million characters. refers to the total pr-str of the data, but overall what I'm trying to get should fall well below those limits.

(let [two-mins-ago (-> (java.time.Instant/now)
                         (.minus (java.time.Duration/parse "pt10m"))
                         java.util.Date/from)
        earlier #inst"2024-03-29"]
    (dl/import-cloud
      {:source source-conf
       :dest   dest-conf
       :filter

       ;; Just the top level filter works if I make it recent enough. but I need older data
       #_{:since two-mins-ago}

       ;; Even just specifying a specific attribute filter for the same
       ;; restriction hangs forever
       #_{:include-attrs
        {::attr-a {:since two-mins-ago}}}

       ;; This is what I really want. But also hangs forever
       {:include-attrs
        {::attr-a {:since two-mins-ago}
         ::attr-b {:before earlier}}}}))
Anything obvious I'm doing wrong? Anyone else have any issues like this? How can I even begin to debug this? I'm on windows, it's possible that's relevant since it doesn't seem well supported, but the fact I can get it working with one of the filters makes me hopeful it's possible to get this working

Joe Lane15:04:05

I’ll have to double check but I don’t know for sure that mixing before and since for different attrs is supported.

Joe Lane15:04:58

You may also be hitting an api-gateway timeout

jjttjj16:04:48

To try to narrow it a bit, I can't even get just a single specific attribute to import. Even this hangs forever for me after printing "Importing" when :my/attr is has a valueType of keyword and ~3000 datoms This should only import those :my/attr datoms, right?

(dl/import-cloud
    {:source ...
     :dest   ...
     :filter
     {:my/attr {}}})

jjttjj16:04:54

Shot in the dark, and unlikely this is the bug vs something I'm doing wrong, but it feels sort of like it's always doing :my/* to import everything at the namespace, even when I give it a specific attribute. (which would make sense why it would timeout since there's a lot of stuff in the namespaces I have). Are you able to confirm that just importing a specific attribute with no filters imports just those datoms?

Joe Lane16:04:20

Is this against a prod system? How large of an instance is the source system?

jjttjj16:04:52

yeah it's against prod, it's a t3.small i believe [edit: confirmed]

Joe Lane16:04:31

I wonder if that instance can handle the import request. Are there any alerts in the logs?

jjttjj17:04:46

Hmm none that seem related, just some like ConsumedWriteCapacityUnits < 150 for 15 datapoints within 15 minutes

Joe Lane17:04:52

Are you being DDB Read throttled? (Check the ddb table's "Monitoring" dashboard tab, not cloudwatch)

jjttjj17:04:01

Oh yeah, it looks like it was causing read throttling when I was attempting that

Joe Lane17:04:44

That explains why you could get very recent answers (that part of the log was still in memory)

jjttjj17:04:01

So the imports can cause throttling even when queries for similar amounts of data don't seem to come close?

Joe Lane17:04:02

your queries are hitting the object-cache/efs/s3 the import is reading the log directly from DDB by scanning ddb items

Joe Lane17:04:11

it isn't the "amount" of data, it's how you're accessing it.

Joe Lane17:04:45

To save cost, t3.small explicitly sets the ddb read provisioning quite low

jjttjj17:04:24

I see, makes sense now

Joe Lane17:04:15

(Sorry it took so long to respond to this, I saw it the day you posted but had to think about how to diagnose it in my background mind)

jjttjj17:04:49

So the solution is probably to just raise read capacity on the table before doing imports, and change it back later right?

jjttjj17:04:05

No worries, it wasn't time sensitive, thanks for getting to it!

Joe Lane17:04:19

That could work, or you might also consider using ddb on-demand provisioning. It's (marginally) more expensive since it's pay-per-request, but as long as you're aware of that you can decide what works best for you

Joe Lane17:04:23

In the long run, as your system grows, picking a bigger instance size is probably the best choice.

jjttjj17:04:03

Sounds good thanks, will consider these options