Fork me on GitHub
#aws
<
2020-02-27
>
kennytilton22:02:50

Has anyone had any luck computing a :content-md5 satisfactory to Amazon S3, via Amazonica?

One twist is that I am gzipping my content, but I tried computing the MD5 off both the gzipped and ungzipped content, no luck. 

The google got me this far computing the md5 param:

(-> (util/json-to-gzip-bytearray "helloworld")
      digest/md5
      bs/to-byte-array
      (bt/encode :base64 {:url-safe? false})
      bs/to-string)

FWIW, I stored once without the MD5 and then got back S3s MD5, "gwD+BwffF71J92+mTz2LPA==", and that works dandy. My code above yields "ODMwMGZlMDcwN2RmMTdiZDQ5Zjc2ZmE2NGYzZDhiM2M=". Just a hair off. :

I also saw somewhere that the MD5 digest must be converted from hex to integer and _that_ converted to base64. Tried that, no luck.

Any tips, links, guesses are welcome! Thx.

hiredman22:02:09

I would be very suspicious of all that code

hiredman22:02:19

very easy to get encoding, or bytes vs. characters wrong

hiredman22:02:52

the java code for getting an md5sum should just return a byte array

shaun-mahood22:02:25

@hiskennyness I've been able to calculate matching MD5s using https://github.com/tebeka/clj-digest, comparing my local files with what comes back from https://github.com/cognitect-labs/aws-api/

shaun-mahood22:02:43

Here's my comparison code

(defn duplicate-file? [s3-file local-file]
  (when (some? s3-file)
    (let [s3-md5 (str/replace (:ETag s3-file) "\"" "")
          local-md5 (digest/md5 local-file)]
      (= local-md5 s3-md5))))

kennytilton22:02:41

Interesting. I will explore that. But S3 wants a base64 encoding passed to it if I want it to validate my upload, and I think that is where I am stuck. Thx, tho! I will learn sth playing with that.

shaun-mahood22:02:16

I think you should be able to use digest to calculate the correct md5, at least assuming that AWS wants it in the same format as they pass back to you. I struggled with this quite a bit and it was all related to how I was reading and writing my files - when I brought it in to Clojure to calculate the md5 I was screwing something up and calculating the hash on the wrong data. Had to get it pulling out a byte-array properly, and for most of my troubleshooting I thought I was performing an incorrect hash on the correct data.

ghadi22:02:44

@U054BUGT4 do you have to strip those quotes in other libraries, or just aws-api?

shaun-mahood22:02:42

I've only used aws-api to get that data, so I'm not sure how the etag data is returned in other libraries

hiredman23:02:03

I recall s3 adding extra quotes around the etag header you get back

hiredman23:02:35

https://github.com/hiredman/propS3t/blob/master/src/propS3t/core.clj#L189 is where I had to slice them off for multipart uploads when doing the s3 thing via the rest api (but that is super old code, I haven't looked at the s3 rest api in a long time)

ghadi23:02:47

ok cool... I know aws-api handles all string datatypes in the same way...

ghadi23:02:05

so it probably wasn't the lib

kennytilton09:02:13

OMG.

(:import [com.amazonaws.util Md5Utils])
.....
(Md5Utils/md5AsBase64 data-gzipped)

kennytilton09:02:20

After a day of googling "Clojure S3 content MD5" and not doing very well it occurred to me to try other languages. Including Java. 🙂 https://docs.aws.amazon.com/AWSJavaSDK/latest/javadoc/com/amazonaws/util/Md5Utils.html Bingo.

kennytilton09:02:54

Too easy. 🙂 Thanks for pitching in, all!

kennytilton09:02:59

Oh, one more thing. I was wondering if I should compute the MD5 off the gzip I was uploading or the data I gzipped. Turned out it was the gzip.