Fork me on GitHub
#clojure
<
2021-03-10
>
aratare00:03:11

@alexmiller @emccue Ah thanks. Was doing some testing with claypool and one of the tests calls shutdown-agents which causes all other tests going boo boo 😅

Alex Miller (Clojure team)02:03:26

yeah, you have to very careful about where you call that

scythx09:03:44

Does anyone has library suggestion for parsing HTML?

p-himik09:03:16

There's a Clojureverse thread about it. I think this particular post very useful: https://clojureverse.org/t/best-library-for-querying-html/1103/18

❀ 3
dharrigan09:03:09

jsoup is really good

❀ 6
dharrigan09:03:23

(if you don't mind doing interop)

scythx09:03:23

yeah, i was searching for clojure library but there's no good. I'm unfamiliar with java ecosystem, so thanks for the suggestion, I'll try it out!

Ed11:03:49

enlive uses jsoup doesn't it? https://github.com/cgrand/enlive

Norman Eckstein14:03:55

What about Hickory?

lukasz15:03:22

We use Jsoup (cleaning) + hickory (traversing, parsing, converting to other formats) - both are great

Tomaz Bracic12:03:17

Hi all

👋 18
jmckitrick14:03:42

So I see that reitit supports malli (of course), and reitit supports swagger, and malli supports swagger. But... can I use malli on my reitit routes and produce swagger docs in one fell swoop?

borkdude14:03:39

@jmckitrick best to ask in #malli or #reitit

Elso16:03:57

Heyo, I'm trying to wrap a given DB-worker-queue implementation in a way that allows me to do some intermediate mapping / side-effecting steps in a sane manner. So basically I have a method that yields a queue-item + context (containing the tx), some item-specific stuff and a consumer and I want to interject something between. First thing coming to mind would be a lazy-seq of sorts, i.e.

(loop
  (let [new-queue-items (->> queue-items (map generic-preparation) (map specific-stuff) (map generic-post-processing))
        item (take 1 new-queue-item)]
    (enqueue item)
    (recur)))
this however doesn't work because I always take the first realized item as it seems I know how to do this in imperative-mess style, but I figure there ought to be a way to do it in more or less idiomatic clojure

dpsutton16:03:14

if queue-items is immutable this is an infinite loop on the first item. if queue-items is mutable i'm not sure where you block and wait and remove from it. you're using this very much like an async channel. if you are already using core.async you can put a transducer on a channel (comp (map generic-preparation) (map ...)) and this would handle your coordination

Elso16:03:49

what makes it mutable though? I was testing the idea with

(defn uuid-seq
  []
  (lazy-seq
    (cons (str (UUID/randomUUID))
          (uuid-seq))))

(def us (uuid-seq))

(take 10 us)
which apparently is not. the deletion part effectively happens via the enqueue, and blocking and waiting before the recur. the problem I had with async was that I couldn't come up with something that would not require a dedicated thread for filling up the input channel and another one running the loop. lest I put onto the channel in the same loop that takes from it?

dpsutton16:03:54

its not mutable. so recurring and doing all the mapping and taking 1 will get the same result from your queue-items

Elso16:03:55

ok really dense question then but what makes a mutable lazy sequence then?

dpsutton16:03:55

you can't make a clojure sequence mutable. you can look at java.util types like blocking queue and some others. this can block waiting on an item. or you can look at core.async channels

Elso16:03:20

k guess async it is then

Elso16:03:36

probably now as good as ever to wrap my head around it

dpsutton16:03:05

as to the dedicated thread stuff, i don't think its true you need two dedicated threads. using async will use whatever thread is available in the async pool i believe. but you certainly need two concurrent threads of execution. you have a loop just watching the queue. if you don't have two threads i don't see how you would keep watching the queue and also populate the queue but with two threads of execution

Elso12:03:09

the idea was something like have the "queue" be a producer that yields when someone takes from it

John Conti18:03:20

Core.async is a big jump. PersistentQueue might be all you need: https://admay.github.io/queues-in-clojure/

javahippie19:03:06

My software is storing longer text segments in a database. Sometimes a text already exist in the database, and I don’t want to store it again in this case. Im playing with the idea of hashing the texts and storing the hash in the DB, too. In this case, I can just compare the hashes, and don’t have to send 20.000 charaters to the database and compare them. Are there any ways to create (relatively) non-colliding hashes in Clojure without adding a library?

javahippie19:03:46

I guess MessageDigest via Java Interop is a good start?

lukasz19:03:46

Yes (we do something similar)

dpsutton19:03:13

a counterpoint, you may want to add a library. because you need these hashes to never change and i doubt you can depend on that from anything that's included already. Clojure has changed its hashing library in the past in which case you would be pretty sunk, right?

lukasz19:03:10

MessageDigest comes from the JDK, AFAIR it doesn't relate in any way how Clojure did it's underlying hashing

dpsutton19:03:25

Will that be identical across newer jvms? I don’t know. Just wondering how you could ensure that things hash consistently as time goes by

lukasz19:03:17

I believe that depends on the type of the hashing algorithm used.

hiredman19:03:12

the thing with databases is you usually want to index things

hiredman19:03:07

so the hash is a way to reduce the size of what gets indexed (index the hash instead of all the text)

hiredman19:03:30

but any hash function can have collisions, so the best thing to do is to basically build a hash table in the database

hiredman19:03:47

index on the hash, but don't enforce a unique constraint

hiredman19:03:14

and search by hash, and then scan through possible results there to see if the text actually exists

➕ 3
javahippie20:03:24

Good point about consistency, but I’d assume that e.g. the SHA256 is implemented in a consistent way across JVMs. Good point to research, assuming is not enought 😉

javahippie20:03:46

Good point about the collisions, @U0NCTKEV8. I’d almost tend to accept the probability of a SHA256 collision, but implementing a way to handle the collision is relatively easy, so there is no reason not to do it.

hiredman20:03:13

If you are willing to accept collisions you can also use smaller/weaker hash functions too https://ankane.org/large-text-indexes

👍 3
schmee21:03:43

the probability of a collision with SHA256 is for all intents and purposes non-existent, and can be safely ignored

schmee21:03:30

@U0N9SJHCH you can certainly rely on SHA256 to be consistent across JVMs, and using MessageDigest with SHA2-256 sounds like a good solution to me :thumbsup:

MatthewLisp21:03:44

Hello everyone 👋 Any guides/tutorials on using Github actions + deps.edn for running tests automatically? Or maybe CircleCI + deps.edn

seancorfield21:03:37

Most of my projects use GitHub Actions for CI and some use CircleCI as well.

MatthewLisp21:03:50

Thanks a lot! 😄

sova-soars-the-sora21:03:01

Hi I have a question. I want to take some source texts [target data set A] and remove some of the words from each sentence [training set] and then train a neural network to add the missing words back in. I am looking at Cortex. Anybody have any recommendations?

blak3mill3r21:03:07

Sounds like you have a lot of text @sova! You might also look at Spark. Do you know what sort of model you want to train at scale? Have you tried it in the small?

blak3mill3r21:03:40

I recommend prototyping your ML model until you see results you like, before trying to distribute it

pez21:03:14

In the “official” guide about Programming at the REPL, there is an ending note about how things can go wrong if you switch to a namespace without first loading it. https://clojure.org/guides/repl/navigating_namespaces#_how_things_can_go_wrong, However, there is no mention about how to fix it there. Anyone know what the fix would be?

Alex Miller (Clojure team)21:03:36

(clojure.core/refer-clojure)

pez21:03:14

Thanks. So just call that and things will be good?

blak3mill3r21:03:22

You will have clojure.core symbols referred, not any code defined in the namespace

Alex Miller (Clojure team)21:03:17

(which is exactly what ns does for you)

sova-soars-the-sora21:03:19

@blak3mill3r thanks! i am just getting started. will check out spark. Yes, presumably it would be quite a lot of text. Would be good to train on teeny tiny samples beforehand. I want to use it as a pre-processing step in language translation.

blak3mill3r22:03:41

I suggest reading about how others are tackling similar problems. https://cs224d.stanford.edu/reports/ManiArathi.pdf

blak3mill3r22:03:31

To do deep learning that understands time (or sequential stuff, like words) I think https://en.wikipedia.org/wiki/Recurrent_neural_network are pretty useful. Not really my area of expertise though, but maybe this gives you some ideas

blak3mill3r22:03:22

There's a lot of NLP stuff that does not use deep learning, too, but I think the state of the art is some form of RNN

sova-soars-the-sora22:03:17

thanks, it's been about a decade since I looked at RNN stuff! 😃