This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-03-21
Channels
- # announcements (26)
- # babashka (115)
- # babashka-sci-dev (5)
- # beginners (48)
- # calva (69)
- # cider (4)
- # clj-commons (11)
- # clj-kondo (1)
- # cljfx (29)
- # clojure (109)
- # clojure-art (1)
- # clojure-czech (1)
- # clojure-europe (33)
- # clojure-nl (1)
- # clojure-nlp (3)
- # clojure-norway (7)
- # clojure-uk (1)
- # clojurescript (63)
- # clr (1)
- # data-science (41)
- # datalevin (1)
- # datomic (11)
- # emacs (58)
- # etaoin (11)
- # figwheel-main (1)
- # fulcro (5)
- # google-cloud (12)
- # helix (2)
- # honeysql (21)
- # hyperfiddle (22)
- # joyride (53)
- # malli (52)
- # off-topic (27)
- # portal (4)
- # re-frame (19)
- # releases (3)
- # ring-swagger (5)
- # xtdb (30)
Scraping with Firefox or Chrome I run into occasional cryptic errors
Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 69
and
Execution error (ExceptionInfo) at etaoin.api/running? (api.clj:2230).
WebDriver process exited unexpectedly with a value: 1
I’m not sure what’s going on here? I am using claypoole to create a 4x threadpool and each thread scrapes a batch of 10 URLs at a time, making use of e/with-headless-driver
for each of the 10 URL batches.The code: https://github.com/kuhumcst/regionh/blob/master/src/dk/cst/regionh.clj#L14-L29
I cant’ even make it past 200 pages scraped before I get this error and the whole things fails…
What's the recommended recovery strategy? Wrapping the e/with-headless-driver in a try-catch and retrying every time?
I don't know @simongray! But... have you tried with a single thread. Does that work?
Also, might want to check for https://cljdoc.org/d/etaoin/etaoin/1.0.40/doc/developer-guide#_webdriver_processes. If you clone Etaoin, from project root, you can use bb drivers
and bb drivers kill
.
Also, have not really looked at your code and use case, if you are doing simple scraping, do you need Etaoin? Would simple direct HTTP GETs work?
Gotcha, had a vague memory and dug up: https://github.com/clj-commons/etaoin/issues/379, wild shot but maybe related.
I'm not sure how many Etaoin users launch concurrent webdriver sessions, so my current guess is that your issue might be related to that.