I am trying to scrape comments from instagram posts, and it is returning a load of UUID type numbers as apposed to the text which is within them.
Here is the code I am using and it shows what I get when evaluating them.
(defn get-instagram-comments []
(let [comments (e/query-all driver {:css "._a9zs"})]
(map #(e/get-element-text driver %) comments)))
(e/query-all driver {:css "._a9zs"})
;; => ["4ba07712-5d86-4a74-8a96-824b59ba4cb6" "6577d825-5140-473d-91dd-f3b9367c8087" "39e8e84d-ea6c-4da5-a30e-ef16a0b8a317" "b2743287-240e-477a-82b5-38ae833f7114" "4f053360-0c19-4680-b110-87bea83a08ca" "87dc8e4c-04b2-4917-8848-e1295de6f8fc" etc etc ...
(get-instagram-comments)
;; => (Error printing return value (ExceptionInfo) at slingshot.support/stack-trace (support.clj:201).
;; throw+: {:response {:value {:error "invalid selector", :message "Given xpath expression \"4ba07712-5d86-4a74-8a96-824b59ba4cb6\" is invalid: SyntaxError: Document.evaluate: The expression is not a legal expression", :stacktrace "RemoteError@chrome://remote/content/shared/RemoteError.sys.mjs:8:8\nWebDriverError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:193:5\nInvalidSelectorError@chrome://remote/content/shared/webdriver/Errors.sys.mjs:432:5\nfind_@chrome://remote/content/shared/DOM.sys.mjs:170:11\n"}}, :path "session/66d9bd68-6904-4ebc-939e-90c5960b6402/element", :payload {:using "xpath", :value "4ba07712-5d86-4a74-8a96-824b59ba4cb6"}, :method :post, :type :etaoin/http-error, :port 35619, :host "127.0.0.1", :status 400, :webdriver-url nil, :driver {:args ("geckodriver" "--port" 35619), :process {:proc #object[java.lang.ProcessImpl 0x4e52ae10 "Process[pid=1946141, exitValue=\"not exited\"]"], :exit nil, :in #object[java.lang.ProcessImpl$ProcessPipeOutputStream 0x6c79954c "java.lang.ProcessImpl$ProcessPipeOutputStream@6c79954c"], :out #object[java.lang.ProcessBuilder$NullInputStream 0x3f4b7278 "java.lang.ProcessBuilder$NullInputStream@3f4b7278"], :err #object[java.lang.ProcessBuilder$NullInputStream 0x3f4b7278 "java.lang.ProcessBuilder$NullInputStream@3f4b7278"], :prev nil, :cmd ["geckodriver" "--port" "35619"]}, :locator "xpath", :type :firefox, :port 35619, :host "127.0.0.1", :url "http://127.0.0.1:35619", :created-epoch-ms 1726886209213, :session "66d9bd68-6904-4ebc-939e-90c5960b6402"}}
Hey James, not sure if I can help you, I don't use Instagram. But I'll review your code if you edit your message and put it in a code block. https://slack.com/help/articles/202288908-Format-your-messages
It seems that the docstring of e/query-all is incorrect.
The docstring of e/query can be used as a hint:
> Returns the found element's unique identifier
@james696, rather than calling get-element-text, you need to call get-element-text-el. The first wants you to supply it with a query and it will search and return the text of the first thing it finds. The second takes an element ID and returns the text for the element with that ID. SInce you are using query-all to grab all your IDs youβll want to take those IDs and feed them to get-element-text-el. Note that many Etaoin functions have both a -el version and a non-`-el` version. The non-`-el` versions take a query such as is given to query.
Or⦠other folks who are willing to read hard to read msgs will help!
@james696, I'm glad you got the help you were looking for, but as a courtesy to me (the current active maintainer of Etaoin), please take the time to format your messages in the future.
Absolutely, will make sure I do so in the future :-)