clojure-uk 2017-10-05 | Slack Archive

it turned out, in my codebase, that quite often i will destructure a few different objects in the same fn and that it was really convenient to put a prefix on the bound names anyway, so :keys wasn't so convenient anymore

Rachel Westmacott09:10:36

I’ve certainly had one or two frustrating moments with typing the wrong one - especially once IntelliJ gets both variants into it’s indices and suggests them both

Rachel Westmacott09:10:04

destructuring is quite an extensive mini-lang though - so I’ve tried to keep it simple, but I think I probably should be leaning more on that

mccraigmccraig09:10:45

i tend to prefer destructuring at the start of a block over keyword access and get-in - the intent is more explicit, the code using simpler names easier to read, and there's only one place (the destructure) you can make a typo without the compiler calling you out

guy09:10:16

morning all!

maleghast10:10:59

Hello All - thanks @otfrom

maleghast10:10:01

Right, so, different enlive question… Anyone know a good way of stripping out all the pointless whitespace in a web-page before using enlive to turn the text into a list of nested maps..?

maleghast10:10:24

The stone age web-pages I am scraping have whitespace that really f**ks things up in them.

maleghast10:10:35

(I could create regexes for all the different combinations of spaces, tabs and “\n”, I realise that , but if there were a way to tell enlive / Clojure to ignore the whitespace, that I have not yet been able to fathom, that would be so much more awesome and less cumbersome…

guy10:10:51

:thinking_face:

guy10:10:50

I'm not sure i understand

guy10:10:59

i thought enlive works by using the id tags and things in the html

guy10:10:28

https://github.com/cgrand/enlive#selectors

guy10:10:28

to get rid of whitespace you could use https://clojuredocs.org/clojure.string/trim

maleghast10:10:46

@guy - It does, but if you have HTML that has no #ids and .classes you rely on pure DOM through enlive and then if there is whitespace that gets interpreted as DOM elements and appears inside the data structure that gets created and effectively (seems to to me anyway) screws up the ability to do things like:

[:body [:table {:tr [:td]]]]

maleghast10:10:17

which should__ get you all the <td>s in the first <table>, which is what I need, but for some reason it doesn’t and when I inspect the data structure(s) there are lots of entries inside the nested maps for things like ” \n” that enlive detects but does not really know what to do with.

maleghast10:10:48

I am fairly sure that if the <td>s I wanted all had a class on them, I would be able to get them really easily. As it is I am doing a lot of hoop jumping to use a bit of enlive’s clever selector stuff along with more traditional data manipulation(s) and the result looks fragile to me, and certainly not very re-usable or configurable.

maleghast10:10:22

e.g.

(defn get-link-hrefs
  [html-snippet acc]
  (reduce
   (fn [acc subcoll]
     (let [href (:href (:attrs (first (:content subcoll))))]
      (if
        (not (nil? href))
        (conj acc href)
        acc)))
   acc
   html-snippet))

(defn get-variable-by-country-list
  []
  (-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    first
    :content
    (html/select [:body [:table]])
    first
    :content
    rest
    (html/select [:tr [:td]])
    (get-link-hrefs '())))

maleghast10:10:45

This ^^ works, but I should__ be able to do:

(defn get-link-hrefs
  [html-snippet acc]
  (reduce
   (fn [acc subcoll]
     (let [href (:href (:attrs (first (:content subcoll))))]
      (if
        (not (nil? href))
        (conj acc href)
        acc)))
   acc
   html-snippet))

(defn get-variable-by-country-list
  []
  (-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    first
    :content
    (html/select [:body [:table [:tr [:td]]]])
    (get-link-hrefs '())))

maleghast10:10:38

(some of the <td>s don’t have an <a> inside them, hence the reducing fn)

maleghast11:10:38

I should really be able to do this:

(-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    first
    :content
    (html/select [:body [:table [:tr [:td [:a (attrs? :href)]]]]]))

maleghast11:10:49

but that REALLY doesn’t work… 😞

dominicm11:10:56

I've never known enlive to break for whitespace...

maleghast11:10:29

So I have a clunky solution, but also a reasonably informed theory that if I could get the HTTP response :body to have no whitespace before I pass it to html/html-snippet then it might work better, and be less clunky…

guy11:10:45

can u give me a sample of the data

the HTML?

sure

is fine

https://crudata.uea.ac.uk/cru/data/hrg/cru_ts_4.01/crucy.1709191757.v4.01/countries/

maleghast11:10:44

oops, sorry, pasted before I read the “PM” bit…

guy11:10:57

nah thats fine

guy11:10:07

i thought u might just stick in ` thats all xD

maleghast11:10:25

I am using aleph to make the request, in case that’s relevant…

guy11:10:54

are u missing a middle element?

guy11:10:00

cos that has tbody

guy11:10:07

urs goes table, tr, td

maleghast11:10:31

When I tried to reference the tbody element in my selector I got an empty list…

guy11:10:37

maleghast11:10:42

But I am prepared to try it again…

guy11:10:53

nah its cool man

guy11:10:57

let me try it at lunch break

guy11:10:35

when i used it, it was pretty simple just specificing the keywords in a vector xD

guy11:10:16

ok back in abit

dominicm11:10:07

just to check, when I read enlive readme again, it looked like the correct selector was [:html :> :body :> :table :> :tbody] , no?

maleghast11:10:43

@dominicm - I saw that, but I also saw the nested vectors thing…

maleghast11:10:49

Let me try that^^

maleghast11:10:52

Well, this->

(-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    (html/select [:body :> :table :> :tbody :> :tr :> :td :> :a]))

brings back an empty list

maleghast11:10:30

as does this:

(-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    (html/select [:body [:table [:tbody [:tr [:td [:a [:attrs [:href]]]]]]]]))

maleghast11:10:11

I also tried the “:>” approach all the way down to :attrs “:>” :href - still an empty list

maleghast11:10:41

(quotes to prevent emojii swapping)

maleghast11:10:01

Based on the selector I can get out of Chrome Dev Tools, I would expect either to work… When I looked at the contents of the data structure that was being created by enlive using html-snippet, I noticed that there were these weird maps inside the list(s) that did not have the same structure as everything else and were clearly trying to express whitespace.

maleghast11:10:52

That’s when I wondered if this would work better with the whitespace cleared out first, based on similar things having similar effects in other languages / environments, when parsing XHTML and XML in my dim and distant past.

maleghast11:10:13

Although, that may be bollocks, as this:

(-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    (html/select [:body :> :table :> :tr :> :td :> :a]))

without the tbody does bring back all the <a> tags that are inside <td> tags…

guy11:10:14

what does [:body 😆 :table 😆 :tbody]

guy11:10:19

give u

maleghast11:10:25

@guy see my last

guy11:10:27

[:body :> :table :> :tbody]

guy11:10:28

guy11:10:11

kk so u know how 😆 works right?

guy11:10:15

':>'

maleghast11:10:26

Not sure how to get the hrefs out though as this:

(-> @(http/get root-url)
    :body
    bs/to-string
    html/html-snippet
    (html/select [:body :> :table :> :tr :> :td :> :a :> :attrs :> :href]))

does not work…

maleghast11:10:31

@guy, no

guy11:10:50

ok so as far as i understand it, it trys to match anywhere in the document with it

guy11:10:51

so try

guy11:10:59

(html/select [ :table :> :tr :> :td :> :a :> :attrs :> :href]))

guy11:10:12

because it will search for table

guy11:10:14

then work from that

maleghast11:10:19

Yeah, that is an empty list.

guy11:10:25

😂

maleghast11:10:31

It will work up to “🅰️” I reckon…

guy11:10:37

maleghast11:10:09

Yeah, I don’t need :body you are quite right

guy11:10:39

yeah i did some xml stuff with enlive and xpath

guy11:10:45

it works in a similar way i believe

maleghast11:10:52

I can just reduce the snippet of all the <a> tags if I have to - would be nice to get all the href with enlive syntax though

guy11:10:13

i guess it depends on whats in the html doc

guy11:10:18

do u really wanna just get all the a tags?

guy11:10:24

what if u get the wrong ones

maleghast11:10:37

No, I want all the <a> tags inside the table

guy11:10:48

yeah

maleghast11:10:51

I know that I want all of them

guy11:10:58

u could try just :td :a

guy11:10:07

:td :> :a

maleghast11:10:08

I will, hang on…

guy11:10:10

and see what u get

guy11:10:25

can i see what that returns?

maleghast11:10:37

Sure, hang on…

guy11:10:43

:thumbsup:

maleghast11:10:07

How do I get an in-buffer evaluation result into the clipboard..?

guy11:10:15

mmm

guy11:10:26

could just do a screen shot

guy11:10:28

maleghast11:10:36

okie dokie…

guy11:10:46

im not sure what ide ur using sorry

maleghast11:10:55

Emacs + CIDER

guy11:10:17

can't you just select it?

i seem to

kewl

that looks good

so u have all the maps in it

guy11:10:09

what did u want out?

maleghast11:10:10

({:tag :a, :attrs {:href cld}, :content (cld)} {:tag :a, :attrs {:href dtr}, :content (dtr)} {:tag :a, :attrs {:href frs}, :content (frs)} {:tag :a, :attrs {:href pet}, :content (pet)} {:tag :a, :attrs {:href pre}, :content (pre)} {:tag :a, :attrs {:href tmn}, :content (tmn)} {:tag :a, :attrs {:href tmp}, :content (tmp)} {:tag :a, :attrs {:href tmx}, :content (tmx)} {:tag :a, :attrs {:href vap}, :content (vap)} {:tag :a, :attrs {:href wet}, :content (wet)})

guy11:10:12

the :content

guy11:10:22

or the :href

maleghast11:10:40

well, in this case either would do as they are the same, but I thought it better to get the :href

guy11:10:54

you should be able to do :attrs

guy11:10:58

let me check the docs

thanks very much 🙂

ok so i guess

i cant see it in the docs but

guy11:10:34

i guess u cant select the attrs out as its not the :tag

guy11:10:54

so each map is made up of :tag, :atts and :content

guy11:10:00

so i think the selection vector takes the tags

guy11:10:40

so u might need to just map over those maps doing (map #(-> % :attrs :href) maps)

guy11:10:04

like the docs say

guy11:10:09

So [:p (attr? :lang)] is going to match any elements with a lang attribute inside a :p element. On the other hand, [[:p (attr? :lang)]] is going to match any p with a lang attribute.

guy11:10:31

so to me it seems like they dont use attrs as a tag but i might be wrong

maleghast11:10:47

Yeah, that was my reading of it, hey ho 😞

Thanks anyway

but try

(attr? :href)

find attributes with :href in

guy11:10:44

(html/select [ :table :> :tr :> :td :> :a :> (attr? :href)]))

guy11:10:46

something like that

guy11:10:03

give that a go

guy11:10:12

cos ur saying the last step should be

guy11:10:17

attribute with :href inside

guy11:10:20

just like they did

guy11:10:25

(attr? :lang)

guy11:10:46

https://github.com/cgrand/enlive#selectors

guy11:10:52

this is what im reading

maleghast11:10:49

I get a horrible “can’t resolve symbol” error when I do that

maleghast11:10:22

But, I do have a really great function now that you helped me to build, @guy so thanks very much for that 🙂

(defn get-link-hrefs
  [html-snippet acc]
  (reduce
   (fn [acc subcoll]
     (let [href (:href (:attrs subcoll))]
      (if
        (not (nil? href))
        (conj acc href)
        acc)))
   acc
   html-snippet))

(defn get-cru-document-links
  [url]
  (-> @(http/get url)
    :body
    bs/to-string
    html/html-snippet
    (html/select [:td :> :a])
    (get-link-hrefs '())
    (->>
     (map #(str url %)))))

maleghast11:10:38

This ^^ works perfectly 🙂

guy12:10:54

:thumbsup:

guy12:10:57

sorry i couldnt help more

maleghast12:10:55

Thanks, really no apology needed - you were loads of help 🙂

guy12:10:41

🙂

yogidevbear14:10:31

https://www.postgresql.org/about/news/1786/

yogidevbear14:10:50

PostgreSQL 10 released ☝️

maleghast14:10:23

Oooh!

otfrom14:10:27

almost a good database for the cloud

otfrom14:10:35

still the best RDBMS

otfrom14:10:46

and the only proper one of two (Oracle being the other)

Rachel Westmacott15:10:00

what makes it bad for cloud?

otfrom15:10:26

doesn't handle node failure well (as sort of trailed in their changelog)

otfrom15:10:39

it is great if you can keep a box from falling over

otfrom15:10:24

(would be happy to be proved wrong about multi-node resilience of postgres as it is a great db for relational/time series/gis)

Rachel Westmacott15:10:16

“gis”?

conan15:10:24

why is datomic not an RDBMS?

conan15:10:24

actually i have a more interesting question for you all. when using spec, if you want to spec out functions, is it preferable to use s/fdef or to put an s/valid? check in the :pre and :post of your functions? I feel like fdef is preferable, but i don't like writing two defs for every fn

guy15:10:42

I've been thinking about the same thing all day

guy15:10:57

fdef is useful for tests as well

guy15:10:07

whereas pre and post dont give you much test wise

guy15:10:37

but from what i heard, you wouldnt want instrument'-ing' on in production (performance wise)

guy15:10:49

so then pre and post might be better

guy15:10:07

😞

conan15:10:49

meh, depends what your performance profile is like

guy15:10:55

😄

conan15:10:08

for my use case i'll probably never have enough going on for it to be a problem

dominicm20:10:38

Worth noting that it can be multiple times slower when on

guy15:10:24

then i would argue fdef is better

guy15:10:42

it adds extra information to docs, it helps out with tests and production issues

conan15:10:43

that is my gut feeling, but i hate writing two forms for each fn

guy15:10:00

you write two fn's but less stuff in tests, is my gut feeling

guy15:10:13

especially if you add generators too the fdef's too

conan15:10:13

yeah that may be true

guy15:10:27

but then again, ive only started using it too

conan15:10:29

well the specs all have generators

guy15:10:37

sure but for specific things

guy15:10:40

is what i mean sorry

otfrom15:10:43

@peterwestmacott geographical information system

guy15:10:53

like that email generator you wrote

minimal15:10:27

you could write the fdef but then get the spec in code if you want to assert explicitly

(s/assert
   (:ret (s/get-spec `this-fn))
   body)

conan16:10:07

yeah sure

conan16:10:14

i think fdef is a winner

conan16:10:37

in case anybody is interested i wrote specs and gens for emails and urls yesterday https://gist.github.com/conan/2edca210999b96ad26d38c1ee96dfe40

sundarj16:10:10

@conan "foo@bar" is a valid email too 😉

seancorfield16:10:50

@conan That's a very restrictive spec for email. Here's a much more accurate regex for emails (not claiming it's perfect):

(def email-regex
  "Sophisticated regex for validating an email address."
  (re-pattern
   (str "(([^<>()\\[\\]\\\\.,;:\\s@\"]+(\\.[^<>()\\[\\]\\\\.,;:\\s@\"]+)*)|"
        "(\".+\"))@((\\[[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\.[0-9]{1,3}\\])|"
        "(([a-zA-Z\\-0-9]+\\.)+[a-zA-Z]{2,}))")))

seancorfield16:10:16

We use test.chucks regex generator to produce sample test emails from that.

seancorfield16:10:05

@sundarj I don't think foo@bar is a valid email address -- unless you meant "foo@bar"@quux.com (which certainly is valid)?

sundarj16:10:13

<input type="email"> in HTML5 accepts it - not every domain has a tld (though most do)

conan16:10:41

it doesn't seem that restrictive to require an @and a .

conan16:10:12

maybe the tld isn't required, in which case only an @ would be necessary, but i've never seen an email without one nor had an error in a system i've validated in this way. i'm happy to wait until i do to add support for it, as i think the benefit of requiring everyone to type it in correctly outweighs the loss of support for users with email addresses that do not have tlds

sundarj16:10:50

that's fair

conan16:10:10

but removing the requirement for a . is the only way i can see to make it less restrictive, if i remove the requirement for an @ as well then i'm just validating that it's a string

sundarj16:10:39

i believe Sean meant 'permissive'

conan16:10:57

oh yes, it is very permissive

conan16:10:25

but my generator only makes alphanumerics, go figure

conan16:10:30

conan16:10:31

the url one is more useful, as i couldn't find a good existing one. there's a uri generator in clojure.spec, but it's just this:

(fmap #(java.net.URI/create (str "http://" % ".com")) (uuid))

conan16:10:50

that's going to make only http://<uuid>.com variations

conan16:10:01

not a wide set of urls

conan16:10:15

they clearly did that because they don't have access to the wonderful cemerick/url

conan16:10:48

anyway, hopefully it'll be helpful, i'll be using it in production soon and it's taught me all about spec well enough

seancorfield17:10:10

I meant restrictive because you only allow alphanumeric characters.

seancorfield17:10:06

Oh no, I misread. You only generate alphanumeric. Got it.

conan17:10:13

yep

conan17:10:35

to be honest i couldn't be bothered to generate all allowed internet characters

conan17:10:54

it'd be easy to update that bit if it matters to you

seancorfield17:10:22

We like the fact our spec is symmetric -- it generates what it accepts and vice versa.

seancorfield17:10:17

In particular, the generation of such wild addresses is a good test for other parts of the system to make sure they don't bake in incorrect assumptions about the structure of an email address.

seancorfield17:10:32

@sundarj Can you point me at any domain names that do not have a TLD? I am surprised that is legal (despite HTML5's validator accepting it).

sundarj17:10:05

every tld is its own domain, http://to used to have some html there for example, but doesn't anymore

sundarj17:10:23

plus the one everyone knows is localhost

sundarj17:10:37

icann has prohibited them now though, so i guess it's all moot

seancorfield17:10:13

Interesting... TIL! Thanks!

sundarj17:10:13

https://www.icann.org/news/announcement-2013-08-30-en