This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-01-15
Channels
- # announcements (7)
- # aws (30)
- # beginners (141)
- # boot-dev (3)
- # cider (48)
- # clara (35)
- # clojure (94)
- # clojure-europe (6)
- # clojure-italy (20)
- # clojure-nl (19)
- # clojure-norway (1)
- # clojure-portugal (6)
- # clojure-spec (7)
- # clojure-survey (3)
- # clojure-uk (93)
- # clojuredesign-podcast (22)
- # clojurescript (20)
- # core-async (54)
- # cursive (29)
- # datascript (1)
- # datomic (4)
- # emacs (2)
- # fulcro (10)
- # jobs (17)
- # juxt (3)
- # kaocha (20)
- # leiningen (20)
- # malli (22)
- # other-languages (7)
- # pedestal (4)
- # perun (2)
- # quil (2)
- # re-frame (7)
- # reagent (3)
- # reitit (31)
- # shadow-cljs (18)
- # spacemacs (11)
- # vim (32)
Re: AWS S3 big files. I've seen multi gigabyte objects uploads, so single large files can be done, but the size of the object needs to be known beforehand, which might be a reason that libs tend to keep the data in memory
I think that S3 has limit of 5G for file transfer in one request. If the file is bigger than that then you must use multipart upload.
the Java libs have a nice TransferManager, that can do parallel multipart downloads, if the object was uploaded in multipart fashion
If I remember correclty current aws-api wants to keep the whole file in memory at least when downloading. http client used by aws-api does not support streaming. amazon java sdk do not have that limitation.
Indeed, with java sdk it doesnt keep all content in memory during upload
what would go around all this big file stuff would be support for pre-signed upload requests in the aws-api client. I’m hoping to see this soon
Dito. I think the new SDK for Java (from Aws) will solve this, but maybe It’s just wishful thinking.
@viesti @jsyrjala multipart uploads are already supported. Call the various Multipart operations with your input split into chunks downloading large files is problematic because of no streaming
CreateMultipartUpload -> UploadPart (many, you can do it concurrently) -> CompleteMultipartUpload
we have some code at work to do this for staging build artifacts. reduces over the chunks of a file, starts a future uploading a part for each one, then waits for all the futures to complete and completes the multipart upload. works great.
have used only multipart download via the java libs in the past, since had a Redshift cluster write data to S3 in parallel :)
we've thought about some "userspace helpers" for aws-api, like paginators, etc. but so far we're focusing on the raw operations