Fork me on GitHub
#aws
<
2022-04-27
>
lispers-anonymous15:04:34

I am using the cognitect labs aws api and am occasionally seeing this error when calling invoke on an aws client for the SNS SQS api

{:cognitect.anomalies/category :cognitect.anomalies/fault,
 :cognitect.anomalies/message  "Abruptly closed by peer",
 :cognitect.http-client/throwable #error {:cause "Abruptly closed by peer"
 :via
 [{:type javax.net.ssl.SSLHandshakeException
   :message "Abruptly closed by peer"
   :at [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint fill "SslConnection.java" 769]}]
   :trace
   [[org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint fill "SslConnection.java" 769]
    [org.eclipse.jetty.client.http.HttpReceiverOverHTTP process "HttpReceiverOverHTTP.java" 164]
    [org.eclipse.jetty.client.http.HttpReceiverOverHTTP receive "HttpReceiverOverHTTP.java" 79]
    [org.eclipse.jetty.client.http.HttpChannelOverHTTP receive "HttpChannelOverHTTP.java" 131]
    [org.eclipse.jetty.client.http.HttpConnectionOverHTTP onFillable "HttpConnectionOverHTTP.java" 172]
    [org.eclipse.jetty.io.AbstractConnection$ReadCallback succeeded "AbstractConnection.java" 311]
    [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
    [org.eclipse.jetty.io.ssl.SslConnection$DecryptedEndPoint onFillable "SslConnection.java" 555]
    [org.eclipse.jetty.io.ssl.SslConnection onFillable "SslConnection.java" 410]
    [org.eclipse.jetty.io.ssl.SslConnection$2 succeeded "SslConnection.java" 164]
    [org.eclipse.jetty.io.FillInterest fillable "FillInterest.java" 105]
    [org.eclipse.jetty.io.ChannelEndPoint$1 run "ChannelEndPoint.java" 104]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill runTask "EatWhatYouKill.java" 338]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill doProduce "EatWhatYouKill.java" 315]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill tryProduce "EatWhatYouKill.java" 173]
    [org.eclipse.jetty.util.thread.strategy.EatWhatYouKill run "EatWhatYouKill.java" 131]
    [org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread run "ReservedThreadExecutor.java" 409]
    [org.eclipse.jetty.util.thread.QueuedThreadPool runJob "QueuedThreadPool.java" 883]
    [org.eclipse.jetty.util.thread.QueuedThreadPool$Runner run "QueuedThreadPool.java" 1034]
    [java.lang.Thread run nil -1]]}}}
It does not happen every time, just sporadically. Has anyone seen this before, or know how I can prevent it? Is it okay to retry? Edit: our version numbers
com.cognitect.aws/api           {:mvn/version "0.8.539"}
com.cognitect.aws/endpoints     {:mvn/version "1.1.12.181"}
com.cognitect.aws/sns           {:mvn/version "820.2.1083.0"}
com.cognitect.aws/sqs           {:mvn/version "814.2.1053.0"}

ghadi15:04:57

calling from inside a Lambda or elsewhere?

lispers-anonymous15:04:11

No sir, this is running in an ECS container

ghadi15:04:59

though it's not marked so, I think it's retriable @dannyfreeman.

ghadi16:04:25

I wonder if the process is hitting an idleness timeout

lispers-anonymous16:04:33

I'm not sure we have any way to tell. There is not much other context to the error. We have other core async work happening, while I don't believe it's hogging the thread pool used by core async (and don't really have a way to prove it right now), could that cause timeout issues, or would it be a problem with the aws service we are hitting? I said above it was an SNS api, but I meant SQS. This issue only happens when we call the DeleteMessage sqs api endpoint.

lispers-anonymous16:04:13

We're going to try to retry by providing an override for :retriable? when calling (aws/invoke client ...) that checks for the default retry conditions and this specific error.

lispers-anonymous17:04:17

We've been cooking with a retry for a little while now and it seems to have taken care of this issue. Is this something that could be added to the the default https://github.com/cognitect-labs/aws-api/blob/a1c15961b35c1a40a76fe9ab4dddfeafc4474eb1/src/cognitect/aws/retry.clj#L47-L56 @ghadi? If so I can open up a ticket or PR in that repository

ghadi17:04:22

an issue would be welcome -- http/connection faults should be classified as one of the retryable anomalies, not the fallthrough "fault"

👍 1
lispers-anonymous17:04:22

Right on. I'll work on writing up an issue to describe what we're experiencing and how we solved it. Thanks for your help

Daniel Jomphe17:04:12

Seeing this today is interesting. We finished the most basic solution for working around intermittent DNS issues with a condition like this:

(and (#{::anomalies/not-found} category)
     (str/includes? (or (-> e ex-data ::anomalies/message) "") ""))
Said issues only happen on a local dev machine with wired connection to their router. We're wondering if some other conditions could happen that would make this solution a solid bother. After all, not-found is not part of the set of retryable anomalies provided by Cognitect.