This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-10-05
Channels
- # aws (1)
- # beginners (57)
- # boot (3)
- # cider (6)
- # clara (49)
- # cljs-dev (47)
- # cljsjs (23)
- # clojure (144)
- # clojure-dev (2)
- # clojure-finland (1)
- # clojure-germany (1)
- # clojure-sg (1)
- # clojure-spec (25)
- # clojure-uk (245)
- # clojurescript (39)
- # core-async (3)
- # cursive (6)
- # datomic (117)
- # emacs (3)
- # fulcro (6)
- # hoplon (10)
- # jobs (7)
- # juxt (5)
- # leiningen (11)
- # om (27)
- # pedestal (4)
- # perun (2)
- # re-frame (22)
- # reagent (35)
- # ring-swagger (11)
- # shadow-cljs (333)
- # spacemacs (10)
- # specter (10)
- # sql (20)
- # vim (8)
Morning @dominicm and indeed EVERYONE! I have a chest infection (minor, acute, just got to wait it out). I am SO HAPPY!
månmån
get well soon @maleghast
Morning :wind_blowing_face:
Bit blustery this morning
morning - I too end up mixing kebab-case
and snake_case
throughout my code, much to my own annoyance (I must have a word with myself)
I think ideally I’d have wrapped all my db calls in a treewalk -> camel-snake-kebab thingy
(and all the api bits that interoperate with other lesser languages)
i mostly stopped using :keys
and switched to long-hand destructuring, so (let [{con-foo-bar :foo_bar :as con} conversation] ...)
…argh! but :keys
is sooo convenient
maybe that’s the best way
it turned out, in my codebase, that quite often i will destructure a few different objects in the same fn and that it was really convenient to put a prefix on the bound names anyway, so :keys
wasn't so convenient anymore
I’ve certainly had one or two frustrating moments with typing the wrong one - especially once IntelliJ gets both variants into it’s indices and suggests them both
destructuring is quite an extensive mini-lang though - so I’ve tried to keep it simple, but I think I probably should be leaning more on that
i tend to prefer destructuring at the start of a block over keyword access and get-in - the intent is more explicit, the code using simpler names easier to read, and there's only one place (the destructure) you can make a typo without the compiler calling you out
Right, so, different enlive question… Anyone know a good way of stripping out all the pointless whitespace in a web-page before using enlive to turn the text into a list of nested maps..?
The stone age web-pages I am scraping have whitespace that really f**ks things up in them.
(I could create regexes for all the different combinations of spaces, tabs and “\n”, I realise that , but if there were a way to tell enlive / Clojure to ignore the whitespace, that I have not yet been able to fathom, that would be so much more awesome and less cumbersome…
to get rid of whitespace you could use https://clojuredocs.org/clojure.string/trim
@guy - It does, but if you have HTML that has no #ids and .classes you rely on pure DOM through enlive and then if there is whitespace that gets interpreted as DOM elements and appears inside the data structure that gets created and effectively (seems to to me anyway) screws up the ability to do things like:
[:body [:table {:tr [:td]]]]
which should__ get you all the <td>s in the first <table>, which is what I need, but for some reason it doesn’t and when I inspect the data structure(s) there are lots of entries inside the nested maps for things like ” \n” that enlive detects but does not really know what to do with.
I am fairly sure that if the <td>s I wanted all had a class on them, I would be able to get them really easily. As it is I am doing a lot of hoop jumping to use a bit of enlive’s clever selector stuff along with more traditional data manipulation(s) and the result looks fragile to me, and certainly not very re-usable or configurable.
e.g.
(defn get-link-hrefs
[html-snippet acc]
(reduce
(fn [acc subcoll]
(let [href (:href (:attrs (first (:content subcoll))))]
(if
(not (nil? href))
(conj acc href)
acc)))
acc
html-snippet))
(defn get-variable-by-country-list
[]
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
first
:content
(html/select [:body [:table]])
first
:content
rest
(html/select [:tr [:td]])
(get-link-hrefs '())))
This ^^ works, but I should__ be able to do:
(defn get-link-hrefs
[html-snippet acc]
(reduce
(fn [acc subcoll]
(let [href (:href (:attrs (first (:content subcoll))))]
(if
(not (nil? href))
(conj acc href)
acc)))
acc
html-snippet))
(defn get-variable-by-country-list
[]
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
first
:content
(html/select [:body [:table [:tr [:td]]]])
(get-link-hrefs '())))
I should really be able to do this:
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
first
:content
(html/select [:body [:table [:tr [:td [:a (attrs? :href)]]]]]))
So I have a clunky solution, but also a reasonably informed theory that if I could get the HTTP response :body to have no whitespace before I pass it to html/html-snippet then it might work better, and be less clunky…
https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.01/crucy.1709191757.v4.01/countries/
just to check, when I read enlive readme again, it looked like the correct selector was [:html :> :body :> :table :> :tbody]
, no?
Well, this->
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
(html/select [:body :> :table :> :tbody :> :tr :> :td :> :a]))
brings back an empty listas does this:
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
(html/select [:body [:table [:tbody [:tr [:td [:a [:attrs [:href]]]]]]]]))
I also tried the “:>” approach all the way down to :attrs “:>” :href - still an empty list
Based on the selector I can get out of Chrome Dev Tools, I would expect either to work… When I looked at the contents of the data structure that was being created by enlive using html-snippet, I noticed that there were these weird maps inside the list(s) that did not have the same structure as everything else and were clearly trying to express whitespace.
That’s when I wondered if this would work better with the whitespace cleared out first, based on similar things having similar effects in other languages / environments, when parsing XHTML and XML in my dim and distant past.
Although, that may be bollocks, as this:
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
(html/select [:body :> :table :> :tr :> :td :> :a]))
without the tbody does bring back all the <a> tags that are inside <td> tags…Not sure how to get the hrefs out though as this:
(-> @(http/get root-url)
:body
bs/to-string
html/html-snippet
(html/select [:body :> :table :> :tr :> :td :> :a :> :attrs :> :href]))
does not work…I can just reduce the snippet of all the <a> tags if I have to - would be nice to get all the href with enlive syntax though
({:tag :a, :attrs {:href cld}, :content (cld)} {:tag :a, :attrs {:href dtr}, :content (dtr)} {:tag :a, :attrs {:href frs}, :content (frs)} {:tag :a, :attrs {:href pet}, :content (pet)} {:tag :a, :attrs {:href pre}, :content (pre)} {:tag :a, :attrs {:href tmn}, :content (tmn)} {:tag :a, :attrs {:href tmp}, :content (tmp)} {:tag :a, :attrs {:href tmx}, :content (tmx)} {:tag :a, :attrs {:href vap}, :content (vap)} {:tag :a, :attrs {:href wet}, :content (wet)})
well, in this case either would do as they are the same, but I thought it better to get the :href
So [:p (attr? :lang)] is going to match any elements with a lang attribute inside a :p element. On the other hand, [[:p (attr? :lang)]] is going to match any p with a lang attribute.
But, I do have a really great function now that you helped me to build, @guy so thanks very much for that 🙂
(defn get-link-hrefs
[html-snippet acc]
(reduce
(fn [acc subcoll]
(let [href (:href (:attrs subcoll))]
(if
(not (nil? href))
(conj acc href)
acc)))
acc
html-snippet))
(defn get-cru-document-links
[url]
(-> @(http/get url)
:body
bs/to-string
html/html-snippet
(html/select [:td :> :a])
(get-link-hrefs '())
(->>
(map #(str url %)))))
PostgreSQL 10 released ☝️
what makes it bad for cloud?
(would be happy to be proved wrong about multi-node resilience of postgres as it is a great db for relational/time series/gis)
actually i have a more interesting question for you all. when using spec, if you want to spec out functions, is it preferable to use s/fdef
or to put an s/valid?
check in the :pre
and :post
of your functions? I feel like fdef
is preferable, but i don't like writing two def
s for every fn
but from what i heard, you wouldnt want instrument'-ing' on in production (performance wise)
@peterwestmacott geographical information system
you could write the fdef but then get the spec in code if you want to assert explicitly
(s/assert
(:ret (s/get-spec `this-fn))
body)
in case anybody is interested i wrote specs and gens for emails and urls yesterday https://gist.github.com/conan/2edca210999b96ad26d38c1ee96dfe40
@conan That's a very restrictive spec for email. Here's a much more accurate regex for emails (not claiming it's perfect):
(def email-regex
"Sophisticated regex for validating an email address."
(re-pattern
(str "(([^<>()\\[\\]\\\\.,;:\\s@\"]+(\\.[^<>()\\[\\]\\\\.,;:\\s@\"]+)*)|"
"(\".+\"))@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|"
"(([a-zA-Z\\-0-9]+\\.)+[a-zA-Z]{2,}))")))
We use test.chuck
s regex generator to produce sample test emails from that.
@sundarj I don't think foo@bar
is a valid email address -- unless you meant "foo@bar"@quux.com
(which certainly is valid)?
<input type="email">
in HTML5 accepts it - not every domain has a tld (though most do)
maybe the tld isn't required, in which case only an @
would be necessary, but i've never seen an email without one nor had an error in a system i've validated in this way. i'm happy to wait until i do to add support for it, as i think the benefit of requiring everyone to type it in correctly outweighs the loss of support for users with email addresses that do not have tlds
but removing the requirement for a .
is the only way i can see to make it less restrictive, if i remove the requirement for an @
as well then i'm just validating that it's a string
the url one is more useful, as i couldn't find a good existing one. there's a uri generator in clojure.spec, but it's just this:
(fmap #(java.net.URI/create (str "http://" % ".com")) (uuid))
anyway, hopefully it'll be helpful, i'll be using it in production soon and it's taught me all about spec well enough
I meant restrictive because you only allow alphanumeric characters.
Oh no, I misread. You only generate alphanumeric. Got it.
We like the fact our spec is symmetric -- it generates what it accepts and vice versa.
In particular, the generation of such wild addresses is a good test for other parts of the system to make sure they don't bake in incorrect assumptions about the structure of an email address.
@sundarj Can you point me at any domain names that do not have a TLD? I am surprised that is legal (despite HTML5's validator accepting it).
every tld is its own domain, http://to used to have some html there for example, but doesn't anymore
Interesting... TIL! Thanks!
apparently google wanted to own http://search and http://app
Hmmm… Anyone got any recommendations on how to turn a list of lists, where the inner lists are pairs of strings, into a map?
I realise that if it was a seq of vector pairs this would just work with (into {} pairs)
(into {} (map vec) pairs)
?
I will try, thanks @seancorfield
I'm slowly beginning to internalize the use of transducers... 😐