Fork me on GitHub
#etaoin
<
2022-06-07
>
lread01:06:09

Hi @steveholt04 I took the day off from open source work! I can probably take a peek sometime tomorrow.

lread16:06:08

Ok, starting to take a peek.

lread16:06:04

I’m going to start with a sanity test in my REPL to see if I can print an HTML web page to a PDF doc.

(require '[etaoin.api :as e]
         '[ :as io])

(def driver (e/chrome-headless))

(e/go driver "")

(def result (e/execute {:driver driver
                        :method :post
                        :path [:session (:session driver) :print]}))

;; let's have a peek at what is in result
(keys result)
;; => (:sessionId :status :value)

;; ok good, from that git issue we expect :value to be a map, is it?
(def value (:value result))

(type value)
;; => java.lang.String
;; ^ nope!

(count value)
;; => 411696
(subs value 0 50)
;; => "JVBERi0xLjQKJdPr6eEKMSAwIG9iago8PC9DcmVhdG9yIChDaH"
;; ^ looks like base64 string is in :value

;; lets decode and save result to a file:
(def bytes-data (-> (java.util.Base64/getDecoder) (.decode value)))
(io/copy bytes-data (io/file "testdoc.pdf"))
When I open testdoc.pdf I see what I would expect, a pdf print of .

lread17:06:33

Ok… let’s carry on and retry the above with the dummy pdf you provided:

(e/quit driver)
(def driver (e/chrome-headless))
(e/go driver "")
;; hmm something is interesting right off the bat:
(e/get-url driver)
;; => "data:,"
;; ^ I would have expected this to be the pdf URL we just navigated to.
Maybe one-or all-of chromedriver/chrome/chrome-headless do not support this scenario? Maybe pdf content is not supported here. Let’s see what a chrome (not headless) gives us:
(e/quit driver)
(def driver (e/chrome))
(e/go driver "")
(e/get-url driver)
;; => ""
;; ^ ok that's more what I'm expecting

;; But unfortunately:
(def result (e/execute {:driver driver
                        :method :post
                        :path [:session (:session driver) :print]}))
;; we get an exception with:
;; PrintToPDF is only supported in headless mode

lread17:06:42

So @steveholt04, as far as I can tell it looks like chrome headless maybe does not support pdf content? And it seems you can’t use chromedriver (headed) to print to pdf. But… all looks fine for HTML content. Is that your understanding too?

lread17:06:15

If you share your goal here a bit more, maybe we can figure out a work-around.

lread21:06:59

Yeah… sounds a bit tricky! If you had a useable url, I guess you’d just download the pdf. (Word of warning about etaoin.driver, as part of taking over maintaining Etaoin, I tried to figure out what is part of the public API. And that ns is currently deemed as internal. Which implies that options should only be specified at driver creation time.) Actually, this seems to work:

(def driver (e/chrome-headless {:download-dir "target/downloads"))
(e/go driver "")
If I look under target/downloads I see my dummy.pdf with the expected content.

lread03:06:05

Thanks for following up @steveholt04, and glad you found something that works for you!

🙏 1