Fork me on GitHub
#aws
<
2022-03-28
>
Maxime Devos12:03:29

Hi, is this an appropriate place to ask questions about com.cognitect.aws.s3?

kenny14:03:40

Hi all. I'm curious if anyone using the Java S3 SDK has encountered "java.net.SocketException: Connection reset" exceptions during a GetObject call (full stacktrace in thread). There's a variety of folks talking about this exception online (GH, AWS forums, etc), and it's not totally obvious how or if folks fix this issue. One suggestion I've seen is to increase the socket timeout (default is 50s), but I'm not completely clear if that will help. There's also suggestions to retry from the beginning after receiving this exception. That doesn't seem like it would work because we're streaming data in and downstream consumers would have already started to process the data. A retry would need to be smarter and start exactly where it left off. I'm quite unsure if any of those would fix the issue for us mostly because I'm not sure what the problem is. Of course this issue only occurs in production and is not readily reproducible on a local machine. So, has anyone encountered this exception with S3? If so, what did you determine your problem was and what was your solution?

kenny14:03:52

Context: We're downloading hundreds of large(ish) files (~1gb gziped csv) in parallel. Perhaps the problem is that the download is taking more than 50s (a bit surprising for 1gb from s3 running in aws), and the connection times out. Thus, the socket timeout increase will fix it. I'm really not sure, though. The API call is a regular .getObject call, upon which we call .getObjectContent and use the inputstream to process the data downstream.

kenny14:03:17

Full stacktrace

lukasz14:03:11

We do process much smaller files (200MB-400MB range) and never hit a timeout issue :thinking_face: One thing we've been doing though is avoid buffering contents in memory, and write to tmp files instead - all of the file processing we do via various Java libs and such knows how to consume File instances rather than work on bytes in memory

lukasz14:03:47

so they get streamed in chunks and we don't get OOMs anymore

kenny14:03:58

Hmm. I don't think memory contention is a problem here.

lukasz14:03:35

Perhaps it is related to how many concurrent connections you have - does S3 client use some sort of keep-alive + connection pool approach?

kenny14:03:22

Yes. I believe it defaults to 50. We're making, at most, 8 concurrent getobject requests.

lukasz14:03:51

Very odd, sorry I can't help more 😢

kenny15:03:21

Thanks for the reply! I'm pondering the concurrent idea. I am blindly trying the increased connection timeout, but the debugging process will be slow since these conditions only occur once per day…

lukasz15:03:52

Do you use any of the VPC endpoints stuff? Or just public URLs for S3?

lukasz15:03:56

So not a DNS or NAT issue then - no idea where to go from there

kenny15:03:37

What do you think of the socket timeout theory?

lukasz15:03:20

So you're using claypoole for concurrent downloads, are you sure you're not hitting the 50 connections limit?

lukasz15:03:01

(While I like claypoole we've been slowly moving to directly using j.u.c.Executors stuff so that we can have more control over timeouts and pool sizes, but that's just as an aside)

kenny15:03:49

1. I'm fairly sure it's not the concurrent downloads. We're passing in a fixed size thread pool with a size of 8. 2. If it was the concurrent downloads, why would it manifest in a socket timeout? 3. I'm certainly not tied to claypoole, but it was already used in many places in the code, so it was easy to reach for. FWIW, I think you can directly pass an ExecutorService to claypoole.

lukasz15:03:28

Yeah, it's very confusing - my only hunch was that issuing concurrent requests and somehow exhausting the 50 connections limit - but yes, very unlikely given your setup

lukasz15:03:00

In my experience when I run into something like this - it was either broken or not, sporadic issues like this are rare

kenny15:03:42

Interesting. If I was exhausting the pool, I'd expect a rejected exception immediately. There's a lot of stuff going on in the sdk, so that could be a poor assumption on my part 😅

Maxime Devos16:03:47

A contributor wants to include com.cognitect.aws.s3 (see https://issues.guix.gnu.org/53765) in a distribution. However, there is a slight problem: it is apparently EPL licensed, but the EPL license file itself is missing and according to 3.2(b) of the EPL, a copy of the EPL must be included with the source code. As such, it seems that com.cognitect.aws.s3 cannot legally be redistributed as-is. Any chance the EPL can be included in the jar?

Reily Siegel10:03:32

So, a few updates here, @U037R2N2FRC. aws-s3 is licensed under apache 2.0, not EPL. However, the Apache license also has a clause that the license must be redistributed with the source. It appears that the sources at https://repo1.maven.org/maven2/com/cognitect/aws/s3/820.2.1083.0/s3-820.2.1083.0-sources.jar do not contain any indication of the license, and I can't find a github repository that contians the sources for the s3 specs. I understand that there is a POM file stating that the sources are licensed under the ASL2.0, but Maxime would prefer an indication distributed with the sources. I will open a github issue with this info.

Alex Miller (Clojure team)11:03:23

The service artifacts are all generated from data so there is no repo for those, but the artifacts should bundle the license