Fork me on GitHub

I'm struggling to get import-cloud working with anything besides a top level :since filter set to an extremely recent time. Outside of that I just get Importing printed and then nothing happening until sometimes an eventual timeout error. I have two attributes I need. ::attr-a has a few hundred new datoms transacted every minute and I only need this most recent data ::attr-b had a bunch of data inserted about a week ago. Duplicate data gets periodically re-transacted, but I think the datoms are still marked with the :t at which they were originally asserted, so I need the filter to include that earliest time. Overall my entire database isn't huge, I'm not sure if this note from the docstring: > import-cloud limits the total number of datoms imported in a transaction to 16 million and limits strings to 1 million characters. refers to the total pr-str of the data, but overall what I'm trying to get should fall well below those limits.

(let [two-mins-ago (-> (java.time.Instant/now)
                         (.minus (java.time.Duration/parse "pt10m"))
        earlier #inst"2024-03-29"]
      {:source source-conf
       :dest   dest-conf

       ;; Just the top level filter works if I make it recent enough. but I need older data
       #_{:since two-mins-ago}

       ;; Even just specifying a specific attribute filter for the same
       ;; restriction hangs forever
        {::attr-a {:since two-mins-ago}}}

       ;; This is what I really want. But also hangs forever
        {::attr-a {:since two-mins-ago}
         ::attr-b {:before earlier}}}}))
Anything obvious I'm doing wrong? Anyone else have any issues like this? How can I even begin to debug this? I'm on windows, it's possible that's relevant since it doesn't seem well supported, but the fact I can get it working with one of the filters makes me hopeful it's possible to get this working

Joe Lane15:04:05

I’ll have to double check but I don’t know for sure that mixing before and since for different attrs is supported.

Joe Lane15:04:58

You may also be hitting an api-gateway timeout


To try to narrow it a bit, I can't even get just a single specific attribute to import. Even this hangs forever for me after printing "Importing" when :my/attr is has a valueType of keyword and ~3000 datoms This should only import those :my/attr datoms, right?

    {:source ...
     :dest   ...
     {:my/attr {}}})


Shot in the dark, and unlikely this is the bug vs something I'm doing wrong, but it feels sort of like it's always doing :my/* to import everything at the namespace, even when I give it a specific attribute. (which would make sense why it would timeout since there's a lot of stuff in the namespaces I have). Are you able to confirm that just importing a specific attribute with no filters imports just those datoms?

Joe Lane16:04:20

Is this against a prod system? How large of an instance is the source system?


yeah it's against prod, it's a t3.small i believe [edit: confirmed]

Joe Lane16:04:31

I wonder if that instance can handle the import request. Are there any alerts in the logs?


Hmm none that seem related, just some like ConsumedWriteCapacityUnits < 150 for 15 datapoints within 15 minutes

Joe Lane17:04:52

Are you being DDB Read throttled? (Check the ddb table's "Monitoring" dashboard tab, not cloudwatch)


Oh yeah, it looks like it was causing read throttling when I was attempting that

Joe Lane17:04:44

That explains why you could get very recent answers (that part of the log was still in memory)


So the imports can cause throttling even when queries for similar amounts of data don't seem to come close?

Joe Lane17:04:02

your queries are hitting the object-cache/efs/s3 the import is reading the log directly from DDB by scanning ddb items

Joe Lane17:04:11

it isn't the "amount" of data, it's how you're accessing it.

Joe Lane17:04:45

To save cost, t3.small explicitly sets the ddb read provisioning quite low


I see, makes sense now

Joe Lane17:04:15

(Sorry it took so long to respond to this, I saw it the day you posted but had to think about how to diagnose it in my background mind)


So the solution is probably to just raise read capacity on the table before doing imports, and change it back later right?


No worries, it wasn't time sensitive, thanks for getting to it!

Joe Lane17:04:19

That could work, or you might also consider using ddb on-demand provisioning. It's (marginally) more expensive since it's pay-per-request, but as long as you're aware of that you can decide what works best for you

Joe Lane17:04:23

In the long run, as your system grows, picking a bigger instance size is probably the best choice.


Sounds good thanks, will consider these options