etaoin

lucian303 2023-09-20T03:00:52.086059Z

I'm getting a timeout exception when running some JS code but only in one environment. Then later in the running process, I also get the same exception when trying to call (refresh) to reload the page or trying to (click) something (that definitely exists and is viewable/clickable). I'm thinking these latter two exceptions are a bi-product of the first exception. Does that make sense? Otherwise why would a refresh throw a timeout exception? Why would there be a socket timeout exception running JS code? The code does start and the js timeout is set in the block to a huge number (2000 secs, over 30 mins) and it's definitely not being reached. The end of my JS script is being reached as far as I can tell when I run it locally (pop os 22) but not when ran on a separate server (ubuntu 22). Locally both headless and normal work. On my server I can only use headless. I'm wondering if it's something with etaoin communicating with the webdriver or part of this chain used to control the browser (chrome). Any ideas would be appreciated. I often notice differences between the two environments but usually it's related to the headless / not headless browser or to websites treating the different source IPs differently. This is my own website I'm trying to crawl, however, so I know none of that is taken into account by the site. Code:

(with-script-timeout driver 2000 (js-execute driver (slurp "src/js/myjs.js"))
Exception:
:at [slingshot.support$stack_trace invoke support.clj 201]}
{:type java.net.SocketTimeoutException
:message Read timed out
:at [sun.nio.ch.NioSocketImpl timedRead NioSocketImpl.java 288]}]

lucian303 2023-09-20T07:11:16.104909Z

So I upgrade the browser and checked all versions. Now I'm getting the same result everywhere, but unfortunately, it's the timeout exception while running the JS. So both locally and remote, headless or headed. So I don't think it's environment dependent.

lread 2023-09-20T11:30:36.817579Z

If you clone the Etaoin project, from its root dir, you can run bb tools-versions. This can help to find oversights in mismatched chromedriver vs Chrome versions, for example.

lread 2023-09-20T03:22:50.323139Z

WebDrivers, browsers, and OSes and their versions all play into how things behave and misbehave. For example, I still get sporadic timeout issues on GitHub Actions on Windows for Etaoin tests. But... Linux and Chrome have typically been pretty stable. It is not unusual to only be able to use headless from a server. You can set up virtual displays if you really want to run headed. You might be getting timeouts because the WebDriver is no longer responding. Maybe. Dunno. Are your WebDriver and browser versions the same when running locally vs. on your server? I've found that sometimes turning on WebDriver logging helps to diagnose. Under https://github.com/clj-commons/etaoin/blob/master/doc/01-user-guide.adoc#driver-options, see :driver-log-level and :log-stdout and :log-stderr .

lread 2023-09-20T03:27:19.338099Z

Also possibly of interest, is that https://developer.chrome.com/articles/new-headless/. I have not played with the new headless mode yet. It will eventually become the default, but not sure for what future version of Chrome that will start to be true.

lucian303 2023-09-20T04:11:48.010599Z

the browser versions are different. i can try to upgrade the server and see if that helps (116.0.5845.187 vs 108.0.5359.40). i will try to use the webdriver logging. the headless works locally as well, but perhaps this is due to the different version numbers. I didn't know about this new headless mode. I will have to try that as well to see if it makes a difference. i appreciate the ideas, thank you

lread 2023-09-20T04:29:17.647269Z

Also: Make sure your chromedriver version matches your Chrome browser version.