Fork me on GitHub
#beginners
<
2021-10-03
>
sova-soars-the-sora00:10:57

I want to screenscrape some non-english pages with the intention of pumping them through google translate or DEEPL. Any [newish] tools I ought to be aware of?

sova-soars-the-sora01:10:19

I'm looking at clj-http and in emacs it seems to get the page result in le nREPL but in a normal command line invokation of the program it's just a bunch of JS which makes me think the site is designed against scraping. SO i might have to emulate the browser opening + getting data ... which I think can be done I just don't remember the name of the lib

sova-soars-the-sora01:10:41

Nvm I'm able to get the screen data via clj-http it's just a nice huge mess to work with after that 😄

walterl15:10:56

Scraping modern JS-heavy monstrosities are simpler with a headless browser, driven by something like Selenium (I see it has Java bindings https://www.selenium.dev/downloads/), or JS-based ones like PhantomJS or SlimerJS.

Jerry Snitselaar00:10:15

https://github.com/stuarthalloway/programming-clojure/blob/master/src/examples/generator.clj : is there some manner of loading files that this doesn't result in unresolved symbols? I ended up requiring and aliasing examples.dataypes.midi to get perform, key-number, and to-msec to resolve, but I'm not sure if I'm just missing something obvious or if it really did require a change.

didibus01:10:18

You probably just had your parenthesis wrong

didibus01:10:54

to-msec, key-number, play and perform are being defined inside the reify, they implement the methods of MidiNote which you imported

didibus01:10:21

Well, perform does seem to be missing though

Jerry Snitselaar02:10:13

My first thought was I had something wrong, but I get the unresolved symbols with the files in Stuart's github as well, so I was thinking I'm not loading something correctly. Get the unresolved symbols in calva and cider. The perform case makes sense to me. I don't understand why it is complaining about the key-number and to-msec use in the define of play.

didibus19:10:42

Ya, that's not normal. You should either get an error saying those methods don't exist in the interface of MidiNote or it should work. I would check parenthesis are in the right place to be sure.

noisesmith15:10:17

also, while import allows using ' it never requires it

noisesmith15:10:33

and it's idiomatic to use () instead of [] with import

Redbeardy McGee02:10:35

This notation is confusing me.

#:foo.bar{:k1 "string1", :k2 val2}
My understanding is that this is a map with a qualified keyword, but I can't see how to get at the values inside it. I've been trying to guess my way through it since google has been stubborn to produce examples. (get-in my-map [:foo/bar :k1]) returns nil and I realize I'm just totally lost on my own because I don't understand what this map is structured like.

seancorfield02:10:15

@U02FK84BKM5 See if this helps:

dev=> (def val2 42)
#'dev/val2
dev=> #:foo.bar{:k1 "string1", :k2 val2}
#:foo.bar{:k1 "string1", :k2 42}
dev=> (println #:foo.bar{:k1 "string1", :k2 val2})
#:foo.bar{:k1 string1, :k2 42}
nil
dev=> (binding [*print-namespace-maps* false]
 #_=>   (println #:foo.bar{:k1 "string1", :k2 val2}))
{:foo.bar/k1 string1, :foo.bar/k2 42}
nil

Redbeardy McGee02:10:51

Interesting. I thought it was nested in some fashion which has definitely thrown me off the trail.

seancorfield03:10:10

Namespaced keywords are an important thing in Clojure. You'll see it a lot with Spec, they're good for describing domain models, you'll see it used with databases like Datomic and XT as well.

Redbeardy McGee03:10:23

i don't know why this library is returning the map in this format yet, but that's okay for now as long as I can learn how to work with it

Redbeardy McGee03:10:35

I thought this would imply that auto-resolution would be the shorthand for getting the values, such as ::id expanding to :foo.bar/id. However, that resolves to the namespace of the current file :my-dev.ns/id instead of the namespace of the map.

seancorfield03:10:56

Right, the normal approach is that you'll require the namespace with an alias, and then use the auto-resolution on the alias:

(:require [foo.bar :as bar])

...

  ::bar/id ;=> :foo.bar/id

Redbeardy McGee03:10:55

That is even more confusing for me now. I may have some things mixed up about what this map even is.

Redbeardy McGee03:10:27

It doesn't appear to belong to a namespace

seancorfield03:10:51

They don't always map to namespaces.

seancorfield03:10:42

In Clojure 1.11, we get a way to require aliases without an underlying namespace:

(:require [foo.bar :as-alias bar])
This doesn't require that foo.bar exists.

seancorfield03:10:11

Prior to 1.11, you had to do

(alias 'bar (create-ns 'foo.bar))

seancorfield03:10:32

Either way ::bar/id expands to :foo.bar/id with these and no code ns needs to exist.

Redbeardy McGee03:10:08

I think you're answering the questions I'm asking, but I'm coming up with the wrong questions.

Redbeardy McGee03:10:01

Thank you for your patience

Redbeardy McGee04:10:43

The library purports to optionally allow Datascript to get around the data, which can very likely be the direction I go in the future. Unfortunately, it does not demonstrate any non-datascript examples. All I can look to is the shape of the data here https://github.com/cjsauer/pubg-clj/blob/master/src/pubg_clj/api/omni.cljc It has me wondering if the implementation is just overlapping with the namespaced keys feature by mistake?

Redbeardy McGee04:10:48

So the situation would be that there is no shorthand notation for reaching inside the maps, unless I use the "fake namespace" type of thing you described with the :as-alias require.

seancorfield04:10:38

I'm not sure what you're asking. Namespaced keys are idiomatic in Clojure.

Redbeardy McGee04:10:44

I do realize that, but what I think I'm finding in this library is that the maps don't actually have any associated namespace, but instead are grouping things together using an overlapping syntax. This overlap is confusing me a bit.

seancorfield04:10:47

I mean, yeah, the shorthand notation hasn't been particularly convenient in pre-1.11 Clojure but you do have the #:some.prefix{...} notation anyway.

seancorfield04:10:02

What do you mean by "overlapping"?

Redbeardy McGee04:10:01

The map I receive from the library looks like this: https://github.com/cjsauer/pubg-clj/blob/333f285212135420463e981afc4a9ecd5314c17d/src/pubg_clj/api/omni.cljc#L396 There is no actual namespace associated, but the keys are named in a grouping fashion that looks identical to the namespaced keywords.

seancorfield04:10:46

Yes. This is what I expect.

seancorfield04:10:57

That's why I'm not sure what you're asking.

Redbeardy McGee04:10:43

I can't find the actual question.

Redbeardy McGee04:10:09

All I can do with my current understanding is express the thing that confuses me

Redbeardy McGee05:10:31

My conclusion is that there is no straightforward and idiomatic shorthand for getting the value mapped to :id in that data structure, because what would normally extract that expects a real ns to do ::id or ::season/id

seancorfield05:10:52

But there is no :id. That's not what the key is called.

seancorfield05:10:25

What you linked to is a Spec, that has a required called :pubg.season/id.

seancorfield05:10:40

That's an idiomatic key name.

seancorfield05:10:21

Think about a domain where you are modeling people and addresses and accounts. :person/id, :address/id, and :account/id are three very different things.

Redbeardy McGee05:10:49

Like I said, I can't find the actual question tickling my brain. I do understand what you're saying, and it is helping.

Redbeardy McGee05:10:22

It's not about whether the key is named in an idiomatic way. It's about where there is a convenient way to retrieve the associated value.

seancorfield05:10:45

What's wrong with just using the (long) key name? That's clear and explicit.

Redbeardy McGee05:10:29

I don't think there's anything wrong with it, but I'm not experienced enough to know why or why not since ::keyname also exists and my intuition told me this would extend to another map that seems to be expressed in the same format

seancorfield05:10:47

It's all about how unique you need a name to be. ::id is unique to the current namespace and you would generally use that to avoid conflicts with keys from outside that namespace -- and only use it within that namespace.

Redbeardy McGee05:10:32

in this case pubg.season is completely ephemeral but looks the same as a namespace provided by another file on disk

seancorfield05:10:56

:person/id is fine for application level stuff. ::person/id, where person is an alias to a "real" namespace, is fine for things that need to be passed across "boundaries" so it is globally unique.

Redbeardy McGee05:10:15

At the end of the day, I think I may be making a mistake of examining a very narrow situation too closely for my current experience level, and I feel like I'm abusing your patience.

Redbeardy McGee05:10:31

I do at least understand, now, that I can and should just use the explicit key names to continue my exploration.

seancorfield05:10:47

It's possible you are overthinking it 🙂 I think that a lot of Clojure tutorial material and even books use simple keywords so folks are not introduced to namespaced keywords early enough in their learning to find it natural -- and then it seems more work to learn about that later on, since you kind of have to unlearn the simple keyword stuff.

Redbeardy McGee05:10:04

I think I'll come around on it by the time it really matters.

seancorfield02:10:19

The keys are :foo.bar/k1 and :foo.bar/k2

Colben09:10:19

Hi, is it a bad practice in Clojure to repeat object's key (:objid in my example) in the value map ? This key is generated in external system. Somehow it seems to me unnatural to not have it in the value part as I feel that the value part should be self-sufficient. Thanks

(def org-units { 1 {:objid 1 :budget 100000  :sub-org-units {2 {:budget 1000} 3 {:budget 3000}}}
                25 {:objid 25 :budget 200000 :sub-org-units {4 {:budget 0} 5 {:budget 0}}}})

practicalli-johnny10:10:20

If the value is useful, then yes it makes sense to include that key, although there seems to be duplication with the top level key for each hash-map Depending on how you use the data structure, a vector of maps may be more suitable, especially if you can flatten the sub-org-units into their own top level map in the vector. In general, using nested hash-maps is okay if traversing specific paths. A vector of hash-maps can be simpler when iterating over all the hash-maps. So consider the functions to be applied to the data when designing it's shape

Colben17:10:19

I was considering vector of maps, but I thought that vector of maps is suitable if you want to iterate trough that vector, but it is not that good for scenario where you want to search in it. (but I don't really know the performance characteristics of these data structures). Is it ok to use vector of maps if I want to also search by the key? Thanks

thom17:10:27

If you want O(1) lookups then you're doing the right thing. I don't think there's anything too nasty about duplicating the value both inside and outside the maps, in fact Clojure's built-in index function creates even more duplication: https://clojuredocs.org/clojure.set/index

didibus19:10:22

Repeating the key like that is totally fine, and that's even what databases do when they index a key on a table.

seancorfield22:10:45

It's pretty common to take a vector of hash maps and turn it into a hash map from some sort of primary key to the whole hash map by doing (into {} (map (juxt :objid identity)) vec-of-maps)

damien14:10:56

Hi all 👋 What is a good recommendation for a Clojure library that idiomatically exposes itself as a Java API? This would be consumed by Java/Kotlin callers.

schmee14:10:08

I think https://github.com/schmee/java-http-clj is a pretty good example DISCLAIMER: I made it, so it is hardly an unbiased opinion 😁

damien01:10:52

Sorry, I meant the exact inverse of this! Updated the original message.

👍 1