Fork me on GitHub
#etaoin
<
2023-03-21
>
simongray14:03:01

Scraping with Firefox or Chrome I run into occasional cryptic errors

Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 69
and
Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 1
I’m not sure what’s going on here? I am using claypoole to create a 4x threadpool and each thread scrapes a batch of 10 URLs at a time, making use of e/with-headless-driver for each of the 10 URL batches.

simongray14:03:43

I cant’ even make it past 200 pages scraped before I get this error and the whole things fails…

simongray14:03:57

What's the recommended recovery strategy? Wrapping the e/with-headless-driver in a try-catch and retrying every time?

lread15:03:50

I don't know @simongray! But... have you tried with a single thread. Does that work? Also, might want to check for https://cljdoc.org/d/etaoin/etaoin/1.0.40/doc/developer-guide#_webdriver_processes. If you clone Etaoin, from project root, you can use bb drivers and bb drivers kill. Also, have not really looked at your code and use case, if you are doing simple scraping, do you need Etaoin? Would simple direct HTTP GETs work?

simongray15:03:28

No, that is why is am using Etaoin. It is a JS heavy site.

lread15:03:20

Gotcha, had a vague memory and dug up: https://github.com/clj-commons/etaoin/issues/379, wild shot but maybe related.

lread15:03:12

I'm not sure how many Etaoin users launch concurrent webdriver sessions, so my current guess is that your issue might be related to that.

lread16:03:56

My recommendation would be to start with a single webdriver session, work out all kinks with that then, maybe try concurrent webdriver sessions.

simongray16:03:47

Ok, will try that tomorrow

lread17:03:01

Cool, please keep us posted on your progress! We'll do our best to lend a hand if you remain stuck.