Fork me on GitHub
#etaoin
<
2022-06-06
>
Steve H22:06:51

Hey all, I’ve been trying to use etaoin to download or print a PDF from a site. There’s no URL specific to the pdf that can be used, and we are having problems downloading or printing the PDF. The PDF is a PDF plugin popup like https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf: What we have tried: • Using the api/execute to automatically download and name a pdf that has been opened in the active driver window. Using https://github.com/clj-commons/etaoin/issues/355 from @borkdude and consulting https://github.com/clj-commons/etaoin/issues/359 from @lee. See example below:

(def z (with-chrome-headless nil d
     (let [pdf-data (etaoin.api/execute {:driver d
                       :method :post
                       :path [:session (:session d) 
                           :print
                           ]})]
 
        (if (zero? (:status pdf-data))
         {:data (:value pdf-data)}
         {:error (:value pdf-data)})) ) )
• From There we are able to return a string but it come back corrupt or blank. We tried using https://base64.guru/converter/decode/pdf. This also happens with https://www.w3.org/WAI/ER/tests/xhtml/testfiles/resources/pdf/dummy.pdf. Sample of the return:
{:data "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9DcmVhdG9yIChDaHJvbWl1bSkKL1Byb2R1Y2VyIChTa2lhL1BERiBtMTAyKQovQ3JlYXRpb25EYXRlIChEOjIwMjIwNjA2MTk1NzAyKzAwJzAwJykKL01vZERhdGUgKEQ6MjAyMjA2MDYxOTU3MDIrMDAnMDAnKT4+CmVuZG9iagozIDAgb2JqCjw8L2NhIDEKL0JNIC9Ob3JtYWw+PgplbmRvYmoKNCAwIG9iago8PC9GaWx0ZXIgL0ZsYXRlRGVjb2RlCi9MZW5ndGggMTE2Pj4gc3RyZWFtCnicbY2xDsIwEEP3+wrPSBy5c5RLvqAzLHwAgk4gUf5fImkHOuC32INtdbZVSJ2j7mI0x+0pbzErWoYydtbpSStJMJWqjJax3OV6wKt3qOZRU5R195f+b/Ubw+AyYTPLLKeJmD8y+pEdrdrYf8i58wU0FCTnCmVuZHN0cmVhbQplbmRvYmoKMiAwIG9iago8PC9UeXBlIC9QYWdlCi9SZXNvdXJjZXMgPDwvUHJvY1NldCBbL1BERiAvVGV4dCAvSW1hZ2VCIC9JbWFnZUMgL0ltYWdlSV0KL0V4dEdTdGF0ZSA8PC9HMyAzIDAgUj4+Pj4KL01lZGlhQm94IFswIDAgNjEyIDc5Ml0KL0NvbnRlbnRzIDQgMCBSCi9TdHJ1Y3RQYXJlbnRzIDAKL1BhcmVudCA1IDAgUj4+CmVuZG9iago1IDAgb2JqCjw8L1R5cGUgL1BhZ2VzCi9Db3VudCAxCi9LaWRzIFsyIDAgUl0+PgplbmRvYmoKNiAwIG9iago8PC9UeXBlIC9DYXRhbG9nCi9QYWdlcyA1IDAgUj4+CmVuZG9iagp4cmVmCjAgNwowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMTUgMDAwMDAgbiAKMDAwMDAwMDM3OCAwMDAwMCBuIAowMDAwMDAwMTU1IDAwMDAwIG4gCjAwMDAwMDAxOTIgMDAwMDAgbiAKMDAwMDAwMDU2NiAwMDAwMCBuIAowMDAwMDAwNjIxIDAwMDAwIG4gCnRyYWlsZXIKPDwvU2l6ZSA3Ci9Sb290IDYgMCBSCi9JbmZvIDEgMCBSPj4Kc3RhcnR4cmVmCjY2OAolJUVPRg=="}
• We also tried base 64 decode from the code below:
(defn decode-base64 [to-decode]
 (String. (.decode (java.util.Base64/getDecoder) to-decode)))
 (
 ( (java.io.ByteArrayInputStream. (.getBytes (decode-base64 "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9DcmVhdG9yIChDaHJvbWl1bSkKL1Byb2R1Y2VyIChTa2lhL1BERiBtMTAyKQovQ3JlYXRpb25EYXRlIChEOjIwMjIwNjAyMjAxMDU1KzAwJzAwJykKL01vZERhdGUgKEQ6MjAyMjA2MDIyMDEwNTUrMDAnMDAnKT4+CmVuZG9iagozIDAgb2JqCjw8L2NhIDEKL0JNIC9Ob3JtYWw+PgplbmRvYmoKNCAwIG9iago8PC9GaWx0ZXIgL0ZsYXRlRGVjb2RlCi9MZW5ndGggMTE2Pj4gc3RyZWFtCnicbY2xDsIwEEP3+wrPSBy5c5RLvqAzLHwAgk4gUf5fImkHOuC32INtdbZVSJ2j7mI0x+0pbzErWoYydtbpSStJMJWqjJax3OV6wKt3qOZRU5R195f+b/Ubw+AyYTPLLKeJmD8y+pEdrdrYf8i58wU0FCTnCmVuZHN0cmVhbQplbmRvYmoKMiAwIG9iago8PC9UeXBlIC9QYWdlCi9SZXNvdXJjZXMgPDwvUHJvY1NldCBbL1BERiAvVGV4dCAvSW1hZ2VCIC9JbWFnZUMgL0ltYWdlSV0KL0V4dEdTdGF0ZSA8P9
HMyAzIDAgUj4+Pj4KL01lZGlhQm94IFswIDAgNjEyIDc5Ml0KL0NvbnRlbnRzIDQgMCBSCi9TdHJ1Y3RQYXJlbnRzIDAKL1BhcmVudCA1IDAgUj4+CmVuZG9iago1IDAgb2JqCjw8L1R5cGUgL1BhZ2VzCi9Db3VudCAxCi9LaWRzIFsyIDAgUl0+PgplbmRvYmoKNiAwIG9iago8PC9UeXBlIC9DYXRhbG9nCi9QYWdlcyA1IDAgUj4+CmVuZG9iagp4cmVmCjAgNwowMDAwMDAwMDAwIDY1NTM1IGYgCjAwMDAwMDAwMTUgMDAwMDAgbiAKMDAwMDAwMDM3OCAwMDAwMCBuIAowMDAwMDAwMTU1IDAwMDAwIG4gCjAwMDAwMDAxOTIgMDAwMDAgbiAKMDAwMDAwMDU2NiAwMDAwMCBuIAowMDAwMDAwNjIxIDAwMDAwIG4gCnRyYWlsZXIKPDwvU2l6ZSA3Ci9Sb290IDYgMCBSCi9JbmZvIDEgMCBSPj4Kc3RhcnR4cmVmCjY2OAolJUVPRg=="))))
    (java.io.File. "test3.pdf"))
It’s been suggested I try sending the following headers along with the POST call via /execute:
Content-Type: application/octet-stream
Content-Disposition: attachment; filename="SOME_NAME.pdf"
Content-Transfer-Encoding: binary
I’m not sure syntactically how this should work after consulting https://cljdoc.org/d/etaoin/etaoin/0.4.5/api/etaoin.api#execute though. Are there any examples out there I’m not finding that show how I can send these headers along with the POST call? Is there anything you guys think I’m missing that’s preventing me from downloading this pdf? Would be grateful for any help or easier ways of accomplishing the goal of downloading a PDF.

borkdude22:06:30

> We also tried base 64 decode from the code below This is not optional, it is required.

borkdude22:06:06

Not sure why the PDF is blank, but perhaps this is because some elements haven't loaded yet?

Steve H22:06:36

Thanks again @borkdude, 🙏 the hope is that we used your example correctly. I will try implementing a wait to rule out saving the file prematurely...

borkdude22:06:07

I'm going to sleep now, perhaps @lee has some ideas

🙏 1