Fork me on GitHub
#etaoin
<
2023-11-08
>
Adham Omran11:11:54

Hey, I'm using etaoin to do some scraping, is there any way that I can get the text of a the tooltip caused by an active element? Such as the attached media where I want "Tuesday, November 7, 2023 at 7:19 PM" I tried to use OCR but the screenshot quality from etaoin is too low and I haven't found a way to zoom in the view. Any help with either approaches?

lread12:11:52

Hi @U03QTHYKXK7 do you have an example public web page? What web browser are you using?

Adham Omran12:11:45

Yes this https://www.facebook.com/Cognitect/posts/pfbid0oWQuewGELmYMLiVELC6e93iyfZZcwHYT3xVSeLwRX7j24WNPi9VfGkJ4HjgpbtMUl shows it. By hovering over "July 17, 2018" by mouse or getting there by invoking

(e/fill-active driver
               (k/chord k/tab))
until you get there

lread12:11:57

Oh, I don't have a Facebook account and have no plans to ever have one. Would you happen to have another example?

Adham Omran12:11:35

This shouldn't require a Facebook account, I can open it with Etaoin and clear the login prompt, let me supply a minimal example

lread12:11:06

Oh ya, ok, I see

lread12:11:27

I'm still sipping my first coffee of the day, will take a peek sometime soon.

Adham Omran12:11:13

(def driver (e/firefox))
(e/go driver "")
;; Remove the login prompt
(e/click driver
         [{:css ".xc9qbxq > i:nth-child(1)"}])
;; Tab to the date
(repeatedly 8
              (fn []
                (e/fill-active driver
                               (k/chord k/tab))))

Adham Omran12:11:46

@UE21H2HHD I'm sipping my second coffee so all good, thanks!

Adham Omran13:11:46

Slack has something similar, when I inspect element I find a data-ts attribute with a unix timestamp as its value but I am unable to find a similar thing in the FB one. They also both show an event button in the Firefox Inspector.

lread13:11:52

@U03QTHYKXK7, it is a bit tricky because the DOM is manipulated on hover. I fiddled with the browser inspector and saw the parent element dynamically gets an aria-describedy attribute on link hover, this, in turn, seems to point to an element that has the full date text. Here's what I tacked onto the end of your example above:

(let [parent-el (e/query driver :active "..")
      describe-elem-id (e/get-element-attr-el driver parent-el "aria-describedby") ]
  (e/get-element-text driver {:id describe-elem-id}))
;; => "Tuesday, July 17, 2018 at 10:47 AM"
Does that work for you?

lread13:11:48

To note: this seems to be returning the full date in the local time zone.

Adham Omran14:11:01

Yes this works for me! Didn't think to monitor the DOM as I hover over it, I'll try to learn to do this and add it to my toolbelt, thank you for your time and help No worries about the local time zone that can be fixed after parsing, the big thing is learn here is to work with a dynamic DOM I have one question, what does the syntax ".." mean here? Is it a sort of catch-all classes?

lread14:11:54

It means parent. So (e/query driver :active "..") asks for the parent element of the current active element.

Adham Omran14:11:46

I see and I did some googling and this looks it comes form an https://stackoverflow.com/questions/28237694/xpath-get-parent-node-from-child-nodesyntax, am I correct?

lread14:11:45

Yep, the .. is an XPath thing.

Adham Omran14:11:13

Understood, I need to sit down for these details soon since I'll be working more with Etaoin, my approach at the moment is Ctrl-F through the -amazingly well written- guide, thank you again this clears all my questions

lread14:11:54

Glad to be of help! Drop by anytime if you get stuck again.