This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-01-08
Channels
- # aleph (1)
- # architecture (4)
- # aws (5)
- # beginners (105)
- # boot (1)
- # boot-dev (72)
- # cider (5)
- # clara (15)
- # cljs-dev (51)
- # cljsrn (5)
- # clojure (155)
- # clojure-austin (3)
- # clojure-dusseldorf (2)
- # clojure-finland (1)
- # clojure-greece (37)
- # clojure-italy (17)
- # clojure-nl (1)
- # clojure-russia (6)
- # clojure-spec (23)
- # clojure-uk (6)
- # clojurescript (7)
- # community-development (1)
- # css (10)
- # cursive (15)
- # datomic (45)
- # defnpodcast (1)
- # duct (97)
- # emacs (5)
- # fulcro (46)
- # hoplon (8)
- # instaparse (25)
- # keechma (11)
- # leiningen (16)
- # off-topic (2)
- # onyx (9)
- # planck (2)
- # re-frame (5)
- # reagent (3)
- # reitit (2)
- # ring (6)
- # shadow-cljs (35)
- # spacemacs (9)
- # specter (9)
- # sql (18)
- # uncomplicate (4)
@alexisvincent great to hear
New to Specter. I'm scraping http://docs.h2o.ai/h2o/latest-stable/h2o-docs/rest-api-reference.html to build a vector of maps where each map will have a key for :http-verb, :rest-path :inputs and outputs. Another challenge is that the html appears to be in 4 conceptual sections, 1) a section of a href links with rest endpoints, 2) a section of h2 headings with the http-verb and rest endpoint followed by a table with Input and Output, 3) a section of a href links with schema nouns, and 4) a final section of h2 headings with schema noun name followed by a table of keys and their descriptions. How might I keep the four sections separate, before combining them? I'm also unclear if I should use select
, collect
, codewalker
, or continue-then-stay
to collect and surface nested pieces of information. Thanks in advance.
@aaelony you're going to have to be more specific
you want to use specter to extract information out of html?
can you paste a sample of the html you're scraping, and what you want as output?
hi @nathanmarz, here is the code in clojure that I'm wondering how to produce in Specter.
(ns testing
(:require [net.cgrand.enlive-html :as html]
[org.httpkit.client :as http]
[clojure.string :as str] ))
(->> (html/html-snippet
(:body @(http/get ""
{:insecure false})))
(filterv #(= (:tag %) :html))
first
:content
(filterv #(= (:tag %) :body))
first
:content
(filterv #(= (:tag %) :div))
first
:content
(filterv #(= (:tag %) :h2))
(mapv #(let [[verb endpoint] (-> %
:content
first
(str/split #" ")
)
inputs (if endpoint
(re-seq #"\{(.*?)\}" endpoint))
]
{:verb verb :endpoint endpoint :inputs inputs}
))
(filterv #(or (= (:verb %) "GET")
(= (:verb %) "POST")
(= (:verb %) "DELETE")
(= (:verb %) "HEAD")))
)