etaoin

James 2024-09-21T18:03:32.599669Z

I am trying to scrape comments from instagram posts, and it is returning a load of UUID type numbers as apposed to the text which is within them. Here is the code I am using and it shows what I get when evaluating them. (defn get-instagram-comments [] (let [comments (e/query-all driver {:css "._a9zs"})] (map #(e/get-element-text driver %) comments))) (e/query-all driver {:css "._a9zs"}) ;; => ["4ba07712-5d86-4a74-8a96-824b59ba4cb6" "6577d825-5140-473d-91dd-f3b9367c8087" "39e8e84d-ea6c-4da5-a30e-ef16a0b8a317" "b2743287-240e-477a-82b5-38ae833f7114" "4f053360-0c19-4680-b110-87bea83a08ca" "87dc8e4c-04b2-4917-8848-e1295de6f8fc" etc etc ... (get-instagram-comments) ;; => (Error printing return value (ExceptionInfo) at slingshot.support/stack-trace (support.clj:201). ;; throw+: {:response {:value {:error "invalid selector", :message "Given xpath expression \"4ba07712-5d86-4a74-8a96-824b59ba4cb6\" is invalid: SyntaxError: Document.evaluate: The expression is not a legal expression", :stacktrace "RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:193:5\nInvalidSelectorError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:432:5\nfind_@chrome://remote/content/shared/DOM.sys.mjs:170:11\n"}}, :path "session/66d9bd68-6904-4ebc-939e-90c5960b6402/element", :payload {:using "xpath", :value "4ba07712-5d86-4a74-8a96-824b59ba4cb6"}, :method :post, :type :etaoin/http-error, :port 35619, :host "127.0.0.1", :status 400, :webdriver-url nil, :driver {:args ("geckodriver" "--port" 35619), :process {:proc #object[java.lang.ProcessImpl 0x4e52ae10 "Process[pid=1946141, exitValue=\"not exited\"]"], :exit nil, :in #object[java.lang.ProcessImpl$ProcessPipeOutputStream 0x6c79954c "java.lang.ProcessImpl$ProcessPipeOutputStream@6c79954c"], :out #object[java.lang.ProcessBuilder$NullInputStream 0x3f4b7278 "java.lang.ProcessBuilder$NullInputStream@3f4b7278"], :err #object[java.lang.ProcessBuilder$NullInputStream 0x3f4b7278 "java.lang.ProcessBuilder$NullInputStream@3f4b7278"], :prev nil, :cmd ["geckodriver" "--port" "35619"]}, :locator "xpath", :type :firefox, :port 35619, :host "127.0.0.1", :url "http://127.0.0.1:35619", :created-epoch-ms 1726886209213, :session "66d9bd68-6904-4ebc-939e-90c5960b6402"}}

lread 2024-09-21T21:57:46.917509Z

Hey James, not sure if I can help you, I don't use Instagram. But I'll review your code if you edit your message and put it in a code block. https://slack.com/help/articles/202288908-Format-your-messages

πŸ‘ 1
p-himik 2024-09-21T22:27:28.966109Z

It seems that the docstring of e/query-all is incorrect. The docstring of e/query can be used as a hint: > Returns the found element's unique identifier

dgr 2024-09-22T01:15:43.098449Z

@james696, rather than calling get-element-text, you need to call get-element-text-el. The first wants you to supply it with a query and it will search and return the text of the first thing it finds. The second takes an element ID and returns the text for the element with that ID. SInce you are using query-all to grab all your IDs you’ll want to take those IDs and feed them to get-element-text-el. Note that many Etaoin functions have both a -el version and a non-`-el` version. The non-`-el` versions take a query such as is given to query.

πŸ™Œ 1
βœ… 1
lread 2024-09-22T01:30:26.048459Z

Or… other folks who are willing to read hard to read msgs will help!

James 2024-09-22T03:36:05.489359Z

@droberts3 worked a treat, thankyou very much!

πŸ‘ 1
lread 2024-09-22T12:39:00.137929Z

@james696, I'm glad you got the help you were looking for, but as a courtesy to me (the current active maintainer of Etaoin), please take the time to format your messages in the future.

James 2024-09-22T14:06:23.502659Z

Absolutely, will make sure I do so in the future :-)

πŸ‘ 1