This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-09-01
Channels
- # announcements (6)
- # atom-editor (4)
- # babashka (59)
- # beginners (51)
- # bristol-clojurians (6)
- # calva (5)
- # cider (22)
- # clara (6)
- # cljfx (28)
- # clojure (182)
- # clojure-australia (1)
- # clojure-berlin (1)
- # clojure-dev (21)
- # clojure-europe (12)
- # clojure-italy (8)
- # clojure-nl (7)
- # clojure-norway (13)
- # clojure-uk (20)
- # clojurescript (2)
- # code-reviews (24)
- # conjure (9)
- # cursive (9)
- # datomic (35)
- # defnpodcast (45)
- # fulcro (18)
- # graalvm (3)
- # graphql (1)
- # hugsql (1)
- # jobs (3)
- # jobs-discuss (2)
- # klipse (1)
- # leiningen (1)
- # malli (5)
- # membrane (4)
- # off-topic (7)
- # portal (2)
- # reitit (30)
- # remote-jobs (1)
- # rewrite-clj (5)
- # ring (2)
- # shadow-cljs (25)
- # test-check (2)
- # tools-deps (5)
- # vrac (26)
- # xtdb (17)
Hi the team! For learning web scraping with Clojure and Jsoup, I transpose this “Python BeautifulSoup" video tutorial https://www.youtube.com/watch?v=XQgXKtPSzUI: • My Clojure version: https://github.com/PrestanceDesign/clj-jsoup-scraping • The Python source code: https://code.datasciencedojo.com/datasciencedojo/tutorials/blob/master/Web%20Scraping%20with%20Python%20and%20BeautifulSoup/Web%20Scraping%20with%20Python%20and%20Beautiful%20Soup.py
@UFBL6R4P3 the one thing that bugs me is it was hard to skim the code and understand which parts were "important". I find it nice to split out the logic from the parts that touch I/O (because you probably would want to add more error-handling around that later on anyways). And you should definitely avoid trying to do too much in your main function. The way it's written, it will be hard to test (automated or in the REPL). I took your code and tried to implement it as I would - step by step in the REPL - and this is what I got: https://gist.github.com/pithyless/5ce137958be0c6a8411b403885c99665 Maybe a side-by-side comparison will be useful to you. Notice that most of the work can actually be tested in the REPL without writing anything to disk (or running -main).
The argument about running in the REPL or in test is valid, but the only chunk of code worthy of running in isolation is your parse-product
— which I agree is something clearly worth pulling out.
Let me be more concrete: Reading @UFBL6R4P3’s code, I immediately knew what it was doing, and what the result would be. Reading your code, my eyes had to jump up and down the page, I had to figure out where and how each fn was being. Heck, I even had to figure out what was a fn vs a letted name, because there were some random letted evaluations I didn’t expect. Emotionally speaking, the former was rather straightforward and not at all frustrating to read. The latter was extremely frustrating given the minuscule scope of the task.
Fair enough, my point was showing how I would explore the task. I believe @UFBL6R4P3 had not explored enough via the REPL or that code wouldn't have ended up all in -main. I could be wrong. Would it be more readable if there were just three functions? parse-product, generate-csv, and -main
? The dom-*
could easily be inlined, the interop is fairly straightforward. But to each is own, I guess. ¯\(ツ)/¯
That's just personal preference; I always prefer to wrap my logic in a better-named function and treat -main
as just a side-effect of how the JVM works. (i.e. I would not expect to eval (-main ...)
in the repl, just to not have to deal with arg parsing, etc.). I realize that's not an objective objection. 😄
> My rule is: Every fn, every name, must prove its worth. It must have some purpose. I think one would be hard pressed to find someone on Clojurians who would object to this truism. If I had chosen better names, you should not have had to jump up and down the page. 🙂
It’s not a “truism,” and it’s not “just personal preference.” There can be no solution to this problem that excludes those steps. Whether you decide to name them or not is, of course, preference. Another example of this is needing a concept in multiple places. Whether you name this shared concept or not, it exists. Pretty much every other name is manufactured entirely by the developer and should be added only after significant consideration. As you allude, sometimes there are indeed reasons to manufacture names: personal preference, documentation, readability, scanability. However, when you choose one of these reasons the upsides are small, and the downsides are significant. The problem with the code wasn’t the particular names you chose. It was the number of completely arbitrary, non-essential names you chose.
@U05476190 @U07S8JGF7 Wow thank you both for your reviews and advices ! it exceeds what I expected!
I feel I'm being misquoted. My truism comment was related to this:
> potetm: My rule is: Every fn, every name, must prove its worth. It must have some purpose.
> pithyless: I think one would be hard pressed to find someone on Clojurians who would object to this truism.
I stand by my statement, that this is a truism. Everyone on Clojurians will agree with this statement. Every function name we choose, must have some purpose. Otherwise all our function names could just be random-uuid. We can only debate whether a specific function is well-named or if it's purpose is worth polluting our global and/or lexical scope with more contextual overhead.
My personal preference comment was related to this:
> potetm: why `generate-csv`? Isn’t that just what `-main` does?
> pithyless: That's just personal preference; I always prefer to wrap my logic in a better-named function and treat `-main` as just a side-effect of how the JVM works.
Here I feel that there is benefit to naming the purpose of the action, not just the context from which it is executed. Mainly, because it often turns out there is more than one way you may want to initiate this action (e.g. not just from the CLI). But if someone were to block this PR, because -main
just calls out to generate-csv
? Well, I would choose not to die on this hill. Hence, "personal preference". 🙂
> The problem with the code wasn’t the particular names you chose. It was the number of completely arbitrary, non-essential names you chose.
In my view, this contradicts your previous statement:
> There can be no solution to this problem that excludes those steps. Whether you decide to name them or not is, of course, preference.
If I felt the names were - completely arbitrary, non-essential - I would not have defined them. They obviously helped me think through the problem and I must have thought they would help the future reader (perhaps me), or I would have removed them. We can debate whether they are well-named and helpful (e.g. I did agree that those interop fns did not have particularly high noise-to-signal ratios and could just as well be inlined), but it's really hard to debate the finer points when the counter stance is: completely arbitrary, non-essential names.
I honestly never expected such a heated discussion over such a small piece of functionality. It really brings back the nostalgia of communities bike-shedding over the most inane details. I would have probably scrutinized my code a lot more before posting; or not even bothered. My real feeling on the subject is to just let it be, because no matter what we come up with - when it comes to elegantly parsing some XML to CSV, Perl programmers have us beat. 😉
Thanks @UFBL6R4P3 for the thought-provoking code and @U07S8JGF7 for the thought-provoking discussion!
I like your 2 approaches and I guess a mix between the 2 will suit me very well :) Before posting the link on this channel, I was already thinking about my code that I had simply translated from Python to Clojure and that I will probably have to apply some functional notions like pure functions, etc.
> I feel I’m being misquoted. Well. Yes. You feel this way, because, after your clarification, that appears to be what happened 😄 That makes sense now.
I wasn’t trying to be offensive or rude by saying “completely arbitrary, non-essential names.” My apologies that I was unclear.
I was trying to put things into 2 buckets: Things the solution space demands, and things the programmer makes up. Being in the second bucket isn’t bad — I do it all the time. But we can acknowledge when it’s the case, what we’re hoping to gain from it, and the mental overhead it brings.
> That's just personal preference; I always prefer to wrap my logic in a better-named function and treat -main
as just a side-effect of how the JVM works. (i.e. I would not expect to eval (-main ...)
in the repl, just to not have to deal with arg parsing, etc.). I realize that's not an objective objection.
@U05476190 Yes it's a preference that suits me, I really see the advantage.
It makes me think eg. to the function (init) with Reagent / Re-frame which calls the function (mount).
One initializes the application while the other mounts the virtual dom and that is quite clear.
@U05476190 I take this opportunity to say that I enjoyed your video on datalog that I watched last week. Crystal clear! Thx
Is my code idiomatic? Especially the Jsoup usage and the data written in the CSV file?