Fork me on GitHub
#datomic
<
2024-02-22
>
cch104:02:54

Well, my Datomic Cloud database has gotten pretty sick in the last six hours. The problem started with an inability to run any new revisions on my primary compute group. Early in the startup of my app it attempts to transact a simple transaction and they would all fail any my ions would be unusable. The failure is reported as a "Connection refused" anomaly inside the Datomic client. Stack trace from the cloudwatch logs in the thread.

👀 1
cch104:02:40

This is an example of the error message

cch104:02:30

After rebooting my EC2 instances in the primary compute group, now I can't even get the original verrsion to deploy. I'm DOWN.

cch105:02:06

Managed to get the system back up and had a successful deploy -but now I'm back down. datomic log is full of these too:

{
    "Msg": "ClientSPIAnomaly",
    "DatomicClientSpiErrorResponse": {
        "Status": 200,
        "Body": {
            "CognitectAnomaliesCategory": "CognitectAnomaliesUnavailable",
            "CognitectAnomaliesMessage": "Loading database"
        }
    },
    "Type": "Event",
    "Tid": 145,
    "Timestamp": 1708580753875
}

jaret12:02:58

@U0698L2BU Loading Databases means the node is attempting to load the database and it is not yet loaded. This happens on node startup. Are your EC2 Instances actually up and healthy and do you have a healthy code deploy on the system?

jaret12:02:12

What version are you running? And do you only see these exceptions around startup (expected) or constantly?

cch113:02:03

The instances are up and healthy and I do have a healthy code deploy on the system. Could the "Connection refused" errors be caused because datomic is still loading databases? If so, then perhaps patience is the solution here.

jaret13:02:15

On startup i have seen it take a min to load a database on a system with many databases. How long have you been waiting?

jaret13:02:43

Anything longer would be problematic in my view…

cch113:02:16

Not more than 20 seconds -so no problem there.

cch113:02:41

If the "Connection refused" is simply a transient symptom of the db loading delay, I can work around that pretty easily.

jaret14:02:07

@U0698L2BU We could use a feature of "lifecycle hooks" to tell the user when the system is "ready" to receive requests. As an aside, the log has a few log messages which might be useful in understanding if a DB is loaded. { $.Msg = "SpindownDb" || $.Msg = "SpinupDb"} ... I think we also report the loaded Databases in CatalogCache.

jaret14:02:54

I also have seen more issues loading databases when separate stacks are at play and specifically you have a smaller sized group trying to load DBs. (i.e. your query group is t3 small and your primary compute is an i3 large)

jaret14:02:38

But having different sized compute machines across groups has other implications for performance that usually rear their head first.

jaret14:02:30

Please let me know what kind of retry you implement and if the issue persists past the initial transient failures.

cch114:02:23

Other than polling by a simple query until no anomalies occur, is there a better way to (maybe `d/db-stats) to know when the db is loaded?

cch114:02:10

Also, I think the prblem might be a bit deeper. In addition to the connection refused errors, I see now that there is a corresponding "Uncaught Exception: Loading database" (see snippet above).

jaret14:02:14

Do I still have or could I get read-only access to Cloudwatch and the system name? Perhaps DM me the account ID (or e-mail or other) so I can look up if I still have access

danieroux15:02:48

We are dealing with similar issues, still putting together a coherent support request and thoughts, and: Some of the same: • Started seeing connection refused a week or so ago • "Loading database" takes more than a minute • We are running latest. Our first HTTP requests after an ION deploy takes more than a minute sometimes (and of course times out after 30s) - there after everything settles.

cch115:02:36

Exactly what I am seeing.

cch115:02:01

(except my timeframe for the problems started last night with the first deploy with the new client)

jaret15:02:06

You see it settle eventually @U0698L2BU?

cch115:02:57

If by settle you mean that eventually the http-direct requests return correct responses.... maybe once. Otherwise, I was hitting the panic button and reverting.

cch115:02:43

Perhaps relevant: last night I managed to get a successful deploy (=> successful AWS deploy + my app started) with the new client and now things seem to be OK. I'm calling that deploy the miracle deploy because prior to it I failed to successfully deploy (cuz app start failures, not strictly a deploy failure of course) probably 12 times and was really freaking out. Nothing much changed other than I had rebooted the instances. The deploy is running now. But I'm scared to try again.

danieroux15:02:04

We've been using

com.datomic/client {:mvn/version "1.0.134"}
Since early January. Started using
com.datomic/client-cloud {:mvn/version "1.0.125"}
Two days ago.

danieroux16:02:53

;; Datomic libs
com.datomic/ion                 {:mvn/version "1.0.62"}
com.datomic/client              {:mvn/version "1.0.134"}
com.datomic/client-cloud        {:mvn/version "1.0.124"}
com.datomic/client-api          {:mvn/version "1.0.68"}
com.datomic/client-impl-shared  {:mvn/version "1.0.100"}
com.datomic/ion-resolver        {:mvn/version "0.9.18"}
com.datomic/java-io             {:mvn/version "0.1.29"}
We just tried this set, and see the same issues. Before this we had this set:
;; Datomic libs
com.datomic/ion                 {:mvn/version "1.0.62"}
com.datomic/client              {:mvn/version "1.0.134"}
com.datomic/client-cloud        {:mvn/version "1.0.125"}
com.datomic/client-api          {:mvn/version "1.0.68"}
com.datomic/client-impl-shared  {:mvn/version "1.0.100"}
com.datomic/ion-resolver        {:mvn/version "0.9.18"}
com.datomic/java-io             {:mvn/version "0.1.29"}

👀 1
danieroux16:02:17

This is when the instances have new code on them:

jaret16:02:23

@U9E8C7QRJ we just resolved Chris's issue and it was related specifically to running out of scaling events in DDB. Could you check to see if you are in the same situation by reviewing your DDB table is failing to autoscale? You can see this in your DDB table under the tab?

jaret16:02:01

If you have a chance can you log me a ticket with this @U9E8C7QRJ that looks like something I want investigate that.

danieroux16:02:46

I'll log that ticket @U1QJACBUM, thank you

Aviv08:02:58

Hey everyone, I’m invoking an Ion lambda (which is a proxy to Datomic transact) and I need to pass some transaction data. One of the attributes is db.type/instant. How can I pass such a value? It’s not from Java, so I don’t have java.util.Date. (I’m invoking the ion lambda from a JS service if it matters) Thanks

César Olea15:02:01

You’ll have to parse the input and convert it to a java.util.Date instance in your code before and then build the transaction payload with it. Assuming you’re using JSON it could be a formatted date string. Another possibility and the one we use ourselves is to use https://github.com/cognitect/transit-format as the interchange format, and extend it with a read handler, so for example you could extend it to take an array of 3 integers and interpret that as year, month and day.

Aviv16:02:21

I tried to avoid doing manipulations on the server. It feels weird; why should I interfere with the data? I just want the ion lambda to be a proxy, and it feels way too coupled to Java for no reason. I was hoping for a solution where I wouldn't need to mess with the data..

César Olea17:02:59

Not sure if I understand your “ion lambda to be a proxy” statement. The lambda itself is still a proxy and its role is just to serve as an entry point to your datomic system. If you were to implement parsing as I suggested, that code would not run in the lambda itself, it would run in the EC2 instance that’s part of your datomic system. Since the attribute is an instant, I don’t see how you could provide one without doing the parsing yourself in your ion code.

Aviv18:02:44

Sorry, my bad, I meant the EC2 not the lambda directly, regards the parsing, I thought maybe there's some format that would work, timestamp/datetime/etc, but from your answer I understand that there isn't anything like that

César Olea19:02:43

No worries I just wanted to be sure it was clear that it wasn’t the lambda doing the computations, it’s a very common misconception with ions. And no, unfortunately there’s no format that I know of.

🙏 1
Aviv22:02:17

Oh ok, thanks!

Aviv07:02:04

@U02DNF3TW3E do you know if its possible to define a function that would run on insertion and will convert timestamp to java.util.Date? (and would be seamless when transacting)

César Olea20:02:32

I think you can achieve that with transaction functions https://docs.datomic.com/cloud/transactions/transaction-functions.html but I’m not 100% sure as I have never coded my own transaction function yet.

Aviv09:02:24

Thanks, I read it already couldn’t find such a function, thanks!

danieroux16:02:53

;; Datomic libs
com.datomic/ion                 {:mvn/version "1.0.62"}
com.datomic/client              {:mvn/version "1.0.134"}
com.datomic/client-cloud        {:mvn/version "1.0.124"}
com.datomic/client-api          {:mvn/version "1.0.68"}
com.datomic/client-impl-shared  {:mvn/version "1.0.100"}
com.datomic/ion-resolver        {:mvn/version "0.9.18"}
com.datomic/java-io             {:mvn/version "0.1.29"}
We just tried this set, and see the same issues. Before this we had this set:
;; Datomic libs
com.datomic/ion                 {:mvn/version "1.0.62"}
com.datomic/client              {:mvn/version "1.0.134"}
com.datomic/client-cloud        {:mvn/version "1.0.125"}
com.datomic/client-api          {:mvn/version "1.0.68"}
com.datomic/client-impl-shared  {:mvn/version "1.0.100"}
com.datomic/ion-resolver        {:mvn/version "0.9.18"}
com.datomic/java-io             {:mvn/version "0.1.29"}

👀 1