Fork me on GitHub

Has anyone had any luck computing a :content-md5 satisfactory to Amazon S3, via Amazonica?

One twist is that I am gzipping my content, but I tried computing the MD5 off both the gzipped and ungzipped content, no luck. 

The google got me this far computing the md5 param:

(-> (util/json-to-gzip-bytearray "helloworld")
      (bt/encode :base64 {:url-safe? false})

FWIW, I stored once without the MD5 and then got back S3s MD5, "gwD+BwffF71J92+mTz2LPA==", and that works dandy. My code above yields "ODMwMGZlMDcwN2RmMTdiZDQ5Zjc2ZmE2NGYzZDhiM2M=". Just a hair off. :

I also saw somewhere that the MD5 digest must be converted from hex to integer and _that_ converted to base64. Tried that, no luck.

Any tips, links, guesses are welcome! Thx.


I would be very suspicious of all that code


very easy to get encoding, or bytes vs. characters wrong


the java code for getting an md5sum should just return a byte array


@hiskennyness I've been able to calculate matching MD5s using, comparing my local files with what comes back from


Here's my comparison code

(defn duplicate-file? [s3-file local-file]
  (when (some? s3-file)
    (let [s3-md5 (str/replace (:ETag s3-file) "\"" "")
          local-md5 (digest/md5 local-file)]
      (= local-md5 s3-md5))))


Interesting. I will explore that. But S3 wants a base64 encoding passed to it if I want it to validate my upload, and I think that is where I am stuck. Thx, tho! I will learn sth playing with that.


I think you should be able to use digest to calculate the correct md5, at least assuming that AWS wants it in the same format as they pass back to you. I struggled with this quite a bit and it was all related to how I was reading and writing my files - when I brought it in to Clojure to calculate the md5 I was screwing something up and calculating the hash on the wrong data. Had to get it pulling out a byte-array properly, and for most of my troubleshooting I thought I was performing an incorrect hash on the correct data.


@U054BUGT4 do you have to strip those quotes in other libraries, or just aws-api?


I've only used aws-api to get that data, so I'm not sure how the etag data is returned in other libraries


I recall s3 adding extra quotes around the etag header you get back

hiredman23:02:35 is where I had to slice them off for multipart uploads when doing the s3 thing via the rest api (but that is super old code, I haven't looked at the s3 rest api in a long time)


ok cool... I know aws-api handles all string datatypes in the same way...


so it probably wasn't the lib



(:import [com.amazonaws.util Md5Utils])
(Md5Utils/md5AsBase64 data-gzipped)


After a day of googling "Clojure S3 content MD5" and not doing very well it occurred to me to try other languages. Including Java. 🙂 Bingo.


Too easy. 🙂 Thanks for pitching in, all!


Oh, one more thing. I was wondering if I should compute the MD5 off the gzip I was uploading or the data I gzipped. Turned out it was the gzip.