Fork me on GitHub

I am using the cognitect labs aws api and am occasionally seeing this error when calling invoke on an aws client for the SNS SQS api

{:cognitect.anomalies/category :cognitect.anomalies/fault,
 :cognitect.anomalies/message  "Abruptly closed by peer",
 :cognitect.http-client/throwable #error {:cause "Abruptly closed by peer"
   :message "Abruptly closed by peer"
   :at [$DecryptedEndPoint fill "" 769]}]
   [[$DecryptedEndPoint fill "" 769]
    [org.eclipse.jetty.client.http.HttpReceiverOverHTTP process "" 164]
    [org.eclipse.jetty.client.http.HttpReceiverOverHTTP receive "" 79]
    [org.eclipse.jetty.client.http.HttpChannelOverHTTP receive "" 131]
    [org.eclipse.jetty.client.http.HttpConnectionOverHTTP onFillable "" 172]
    [$ReadCallback succeeded "" 311]
    [ fillable "" 105]
    [$DecryptedEndPoint onFillable "" 555]
    [ onFillable "" 410]
    [$2 succeeded "" 164]
    [ fillable "" 105]
    [$1 run "" 104]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask "" 338]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce "" 315]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce "" 173]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run "" 131]
    [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run "" 409]
    [org.eclipse.jetty.util.thread.QueuedThreadPool runJob "" 883]
    [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run "" 1034]
    [java.lang.Thread run nil -1]]}}}
It does not happen every time, just sporadically. Has anyone seen this before, or know how I can prevent it? Is it okay to retry? Edit: our version numbers           {:mvn/version "0.8.539"}     {:mvn/version ""}           {:mvn/version "820.2.1083.0"}           {:mvn/version "814.2.1053.0"}


calling from inside a Lambda or elsewhere?


No sir, this is running in an ECS container


though it's not marked so, I think it's retriable @dannyfreeman.


I wonder if the process is hitting an idleness timeout


I'm not sure we have any way to tell. There is not much other context to the error. We have other core async work happening, while I don't believe it's hogging the thread pool used by core async (and don't really have a way to prove it right now), could that cause timeout issues, or would it be a problem with the aws service we are hitting? I said above it was an SNS api, but I meant SQS. This issue only happens when we call the DeleteMessage sqs api endpoint.


We're going to try to retry by providing an override for :retriable? when calling (aws/invoke client ...) that checks for the default retry conditions and this specific error.


We've been cooking with a retry for a little while now and it seems to have taken care of this issue. Is this something that could be added to the the default @ghadi? If so I can open up a ticket or PR in that repository


an issue would be welcome -- http/connection faults should be classified as one of the retryable anomalies, not the fallthrough "fault"

👍 1

Right on. I'll work on writing up an issue to describe what we're experiencing and how we solved it. Thanks for your help

Daniel Jomphe17:04:12

Seeing this today is interesting. We finished the most basic solution for working around intermittent DNS issues with a condition like this:

(and (#{::anomalies/not-found} category)
     (str/includes? (or (-> e ex-data ::anomalies/message) "") ""))
Said issues only happen on a local dev machine with wired connection to their router. We're wondering if some other conditions could happen that would make this solution a solid bother. After all, not-found is not part of the set of retryable anomalies provided by Cognitect.