etaoin

simongray 2023-03-21T14:12:01.647259Z

Scraping with Firefox or Chrome I run into occasional cryptic errors

Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 69
and
Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 1
I’m not sure what’s going on here? I am using claypoole to create a 4x threadpool and each thread scrapes a batch of 10 URLs at a time, making use of e/with-headless-driver for each of the 10 URL batches.

simongray 2023-03-21T14:32:43.427579Z

I cant’ even make it past 200 pages scraped before I get this error and the whole things fails…

simongray 2023-03-21T14:38:57.020059Z

What's the recommended recovery strategy? Wrapping the e/with-headless-driver in a try-catch and retrying every time?

lread 2023-03-21T15:04:50.443159Z

I don't know @simongray! But... have you tried with a single thread. Does that work? Also, might want to check for https://cljdoc.org/d/etaoin/etaoin/1.0.40/doc/developer-guide#_webdriver_processes. If you clone Etaoin, from project root, you can use bb drivers and bb drivers kill. Also, have not really looked at your code and use case, if you are doing simple scraping, do you need Etaoin? Would simple direct HTTP GETs work?

simongray 2023-03-21T15:35:28.260699Z

No, that is why is am using Etaoin. It is a JS heavy site.

lread 2023-03-21T15:36:20.657099Z

Gotcha, had a vague memory and dug up: https://github.com/clj-commons/etaoin/issues/379, wild shot but maybe related.

lread 2023-03-21T15:39:12.511759Z

I'm not sure how many Etaoin users launch concurrent webdriver sessions, so my current guess is that your issue might be related to that.

lread 2023-03-21T16:10:56.032699Z

My recommendation would be to start with a single webdriver session, work out all kinks with that then, maybe try concurrent webdriver sessions.

simongray 2023-03-21T16:57:47.326879Z

Ok, will try that tomorrow

lread 2023-03-21T17:31:01.352259Z

Cool, please keep us posted on your progress! We'll do our best to lend a hand if you remain stuck.