Fork me on GitHub
#clojure
<
2022-10-14
>
Brian Beckman00:10:10

SOLVED Hello, friends. I’ve settled on maybe-m from clojure.algo.monads for composing big expressions where something could go wrong anywhere. I had a question from the examples folder. I was given to understand that the right-hand sides of binding-form/monadic-expression pairs must be monadic values. Yet, in the example, aren’t x and y just ordinary values? Does domonad automatically apply m-result to them? This example works, but I’m sure I don’t understand why it works.

(defn safe-div [x y]
  (domonad maybe-m
     [a x
      b y
      :when (not (zero? b))]
     (/ a b)))

hiredman01:10:47

maybe-m may just use nil as nothing, and not nil as something

hiredman01:10:11

Which avoids adding extra boxing

hiredman01:10:33

That kind of implementation is clever, but can result in oddities if you are nesting monadic values, m-result of nil is still nil

didibus02:10:04

The Monadic context for maybe-m is any value. m-result is just the identity function. So any Clojure value is a valid maybe-m Monad.

👍 1
seancorfield03:10:49

So maybe-m is basically some-> 😆

seancorfield03:10:20

(and this is partly why monads are so rarely used in Clojure)

seancorfield03:10:09

At work we compose pipelines of functions that could "fail" and right now each function has the equivalent of:

(if (valid? input)
  (process input)
  input)
So I proposed a more general version of some-> where you pass a predicate in and that determines whether to continue with the pipeline or not: https://ask.clojure.org/index.php/12272/would-a-generalized-some-macro-be-useful

Brian Beckman12:10:04

My application is a recursive AST constructor, with level-crossing semantical constraints that could go wrong anywhere. I want to reject trees that have any problem, but I don't care what the problem is. I'll give some-> etc. a try to see whether they can save me some trouble. Thanks for the discussion!

Brian Beckman12:10:07

(to be more precise, a random AST generator from spec.alpha.generators and test.check.generators)

Ed13:10:56

@U04V70XH6 for consistency with things like as-> and cond-> shouldn't pred-> take the expr first, then the pred?

seancorfield13:10:04

@U0P0TMEFJ Yes, probably. I figured the core team will "do the right thing" with it.

👍 1
seancorfield13:10:15

I added a note to the Jira ticket about that.

💯 1
Ed13:10:57

many thanks ... sorry for bothering you so early (just realised what time it is where you are) ... have a good day 😉

Brian Beckman18:10:46

I ditched maybe-m for explicit nil punning and it simplified my code. tyvm.

1
👍 1
didibus21:10:25

Ya, pretty much maybe-m is just a monadic version of the some-> macro, though I guess it works more like a let, and there is no let-some macro. I think the lack of usefulness here is a bit due to algo-monad being barebones. I think catz and fluokitten would be more useful because they'd add monadic map, apply, fold, traverse etc. But also, there's no denying that dynamic types and macros together often solve a lot of the same problems.

hiredman21:10:19

either (which I am note sure if algo.monads has) is usually nicer for pipelines too, since you have either a value to process or error information to pass through

wevrem02:10:47

How does one suggest a fix of a possible typo on the Clojure and ClojureScript home pages? Down at the bottom (of both pages) in the blurb about Nubank, the first paragraph ends with “Cognitect, the consultancy behind the Clojure and the Datomic database” (emphasis added). That extra “the” before Clojure reads strangely, and the former newspaper editor in me wants to strike it. Or we could embrace it and start saying “the Clojure” for everything, but that doesn’t seem right. 😉

👍 1
Alex Miller (Clojure team)03:10:53

I can take care of that

Lone Ranger14:10:47

I was hoping for "The Clojure"

😆 1
bortexz13:10:00

Trying to design a client API using cognitect.anomalies for errors, and have some questions about what things would translate to what anomalies: • What would be examples of errors/exceptions that would translate to cognitect.anomalies/conflict? Things like entity-id/primary-keys uniqueness conflicts? Or would those be incorrect ? • Would a request that times out *obtaining a response (*e*onnection succesful*le as conflict? (when “Coordinate with callee” might be necessary, as the request might have actually reached destination but that is unknown) or this would also fall under unavailable ? • If the client is disconnected from internet, is that also unavailable ? • Rate-limit rejections would fall under busy or forbidden ? Any thoughts on this would be appreciated, thanks!

kwladyka13:10:15

https://github.com/cognitect-labs/anomalies ? Last commit was 4 years ago. Just noticed.

kwladyka13:10:24

Unless we talk about something else?

kwladyka13:10:20

I know this is not a answer for your question, but I think native Java logs in Java 11 and later are really good. My personal opinion.

bortexz13:10:52

Yes, cognitect anomalies

kwladyka13:10:04

*I mean JUL java.util.logging

kwladyka13:10:17

I don’t know too much about it

kwladyka13:10:34

just added some uninvited info in the topic 😉

kwladyka13:10:21

Personally I like to log as JSON instead of string. Then I can do whatever I want in such structured logs.

kwladyka13:10:29

I mean you can make your own rules, names etc. JSON can be easy searchable, so you can set alerts, charts etc.

kwladyka13:10:57

sorry if it doesn’t help, trying to say something valuable 😉

bortexz13:10:06

Not sure logging solves what I am trying to achieve, as I want a program to be able to dispatch/react to a specific error on call site, while logging would be more of a fire and forget thing that one can react on a different place

kwladyka13:10:07

oh, maybe I misunderstood what cognitect anomalies is.

bortexz13:10:02

np, anomalies is more like exceptions and http error status, it tries to categorize different possible errors so that you can dispatch only on those anomaly categories, simplifying error handling. But because it’s an “abstraction” over other error primitives, you need to translate specific exceptions/error-codes/api-errors/etc… into anomalies, thus my question abouch specific translations

kwladyka13:10:38

heh still I have some issue to understand how to use it http://clojure-liberator.github.io/liberator/tutorial/decision-graph.html - maybe this help?

bortexz13:10:01

that graph translates to http error codes, I would need something similar but for cognitect.anomalies 🙂

walterl22:10:43

I would use ::anom/conflict for cases like where (e.g.) a client tries to upsert a record, but the record already exists, with different values than those specified by the client. The client call isn't invalid (so not ::anom/incorrect), but the server also can't continue due to the conflicting data.

👍 1
respatialized14:10:58

I have a queue-shaped problem I’m trying to think through. I have three “work streams” that have the following relationship: • stream 1: needs to be processed sequentially • stream 2: processes the output of stream 1, can be parallelized • stream 3: processes the output of stream 2, can be parallelized What should I be looking for to implement this behavior? core.async channels? A ForkJoinPool? I’m pretty inexperienced with async programming so I’m having trouble evaluating my options. Any pointers for the direction I need to go would be super helpful.

Lone Ranger14:10:12

Is this a batch operation or a streaming operation? Could be as simple as

(defn doit [data]
   (->> data (map stream1) (pmap stream2) (pmap stream3)))

1
💡 1
bortexz14:10:30

If using core.async, have a look at https://clojure.github.io/core.async/#clojure.core.async/pipeline for stream 2 and 3. For stream 1 probably a normal go-loop (or thread if computationally expensive/IO). I’d say core.async is ok for this task as long as you don’t need observability of the queues, otherwise core.async doesn’t help you there

Joshua Suskalo15:10:00

don't use pmap for this unless the computations are very expensive and non-blocking

respatialized16:10:11

Stream 3 blocks (db transaction), Stream 2 does not - pure Clojure computation, probably the most expensive operation defined in the scope of my program. Stream 1 doesn't block but it's IO -each item is a separate file.

pppaul16:10:58

i have used manifold for stuff like this https://github.com/clj-commons/manifold it's code is pretty easy to understand, docs are ok. there used to be examples/tutorials, not sure where they ended up though. no matter what tool you use, the streaming part is going to be a pain in the ass to debug compared with regular clojure code, so try to use as little as possible, and if you can avoid it altogether.

Lone Ranger16:10:03

is stream3 a pure sink, or is it doing some DB queries and then you take that result and do something downstream with it?

🕳️ 1
respatialized16:10:49

Thinking about this a little bit more, it could perhaps be simplified to: • Stream/queue 1 - process files sequentially • Stream/queue/batch 2 - grab as many items as are available to process from (1), process them, then transact. Could be multiple items or just 1 (isolation of transactions is not important)

respatialized16:10:10

@U3BALC2HH pure sink, the output of this ETL is a queryable database but further processing isn't part of the "job"

Lone Ranger16:10:45

I think the database limitations about number of concurrent connections here are most important, but it sounds like you could get away with dumping chunks of data into a worker pool the performs transactions

Lone Ranger16:10:30

Naively,

(doall 
 (for [chunk (map stream1 data)]
   (future 
     (stream2 chunk))))

Lone Ranger16:10:12

but there are slicker ways to accomplish the above

pppaul16:10:15

sounds like you want the last stream processor to do batch operations, probably with a timer to force op when batch can't get enough items in time

pppaul16:10:46

you could probably do this without streams, and use something like chime to do a job/task workflow. that would be a lot easier to debug

Lone Ranger16:10:40

I like the job/task approach, although that seems like a lot of work to setup for a one-off job. I don't know how easy chime is to use

pppaul16:10:58

chime is about as easy as JS setTimeout

Lone Ranger16:10:09

woooo that's pretty great!

Lone Ranger16:10:14

will have to check that out

pppaul16:10:32

is has some features that make it work with seqs of dates (infinite date seqs)

respatialized16:10:22

This job needs to be pretty durable (batches can last several hours or more), so the overhead of chime is probably worth it. The throughput of stream 1 is the primary limiting factor.

pppaul16:10:07

i did a streaming backend on one project, and i ran into a lot of problems because of it. streams are not free. the stream code is tiny and does pretty cool things, but then you have to interface with it, and you have to always know if you are dealing with a stream, a deferred, or normal clojure stuff and it's taxing and debugging is different for each scenario

Lone Ranger16:10:21

Over in the data engineering department on the zulip chat, they recommended core.async's https://clojuredocs.org/clojure.core.async/pipeline -- if the results of stream 1 need to be in order (credit @UQ58AKW0J)

respatialized16:10:50

I am using safely for controlled retries of calls to external resources and it gives me observability with mulog for free, which has helped with debugging a lot - would probably help with the streaming context, but I am also totally ok with a scheduler as simple as “while there are still items to process, grab as many as you can every n seconds”

pppaul16:10:17

ok, if you are primarily dealing with batches, then you lose a lot of the benefits of streams. at least a full stream solution. if you know when you need to start processing data, and this job doesn't really interface with the rest of your app, you are in a very nice place. you can treat the job as a black box, use streams or whatever inside it, or something else.

respatialized16:10:42

I do worry about synchronizing the throughput of the scheduled job to the throughput of stream 1 though

Lone Ranger16:10:58

curious what the issue with the throughput here is, it sounds like this could be drip fed

pppaul16:10:06

i think tuning is going to be something that you'll have to do profiling on. a process can't really know how long something will take to do without heuristics, it could profile itself to make those, though.

Lone Ranger16:10:11

I think it sounds more like you're concerned about some brittle I/O, retries, and restarts (all valid concerns) than raw ingestion speed, am I right?

respatialized16:10:17

@U0LAJQLQ1 yeah, I think the general idea was to use streams inside of the batch job so I could process the data concurrently with the IO (which is error prone and rate limited)

pppaul16:10:58

if things are happening on the same CPU, and there is other stuff on the system (web app, db, whatever) then you probably don't want to do things very fast. and if you do want to do things as fast as possible, you may need more than 1 computer, and now you have a harder problem to solve

pppaul16:10:21

streams don't help you much with errors, you are going to be in a stream context for error handling, which is a bit different from normal clojure (same issue in JS if you have done any of that).

pppaul16:10:52

rate limited sounds pretty bad for streams too.

respatialized16:10:45

I’m still very much following Frank McSherry’s http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html of “If you are going to use a big data system for yourself, see if it is faster than your laptop” so distributed computation is way out right now.

pppaul16:10:00

but, you may have a stream fn that respects that rate limit, so that could be a benefit. i think you would have to do some testing to see how your streams interact with each other, and figure out the buffer sizes you want, and also figure out some rules you may want for connecting the streams together

respatialized16:10:12

I may be using the term “stream” too loosely here; what I really mean is “lazy sequence of items to fetch and then process”, which may differ from the sense in which you’re using it

pppaul16:10:14

we still don't know much about your problem. it could be the case that clojure isn't even a good tool for this, and something like command line tools, or kafka or whatever is better.

Lone Ranger16:10:40

@UFTRLDZEW if you really want to dig into this hard, plenty of folks willing to nerd out with you in the https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/Data.20Engineering on Zulip, because what @U0LAJQLQ1 said may be correct and this may be even beyond the scope of Clojure

pppaul16:10:50

streams are functions as a process

❤️ 1
pppaul16:10:35

so, they are like mini computers with a set of memory (buffer), and I/O to other streams

Lone Ranger16:10:09

I love that quote!! Now we just need to throws some VC money at it and it will now be SaFaaPaaS ... Streams are Functions as a Process as a Service!

pppaul16:10:05

i've seen VC money thrown at more bizarre stuff

ethereum 1
pppaul17:10:23

if you can build your batch as a series of threaded functions, then you have a blueprint to make your streams, once you have your streams you can use extra features to do tuning. https://www.youtube.com/watch?v=1bNOO3xxMc0 you may be interested in this talk to get an idea of what streams give you over a series of functions. the main issue is going to be error stuff, so if you can split error handing into it's own functions, and then streams, you'll be able to avoid a lot of pain.

cddr16:10:40

Dunno what database your sink is but if you're trying to max the throughput of a single box, you might need some kind of "copy into". Postgres lets you stream data right into a table using this approach. Most databases have support for bulk copying.

Lone Ranger16:10:14

Great point @U065JNAN8. Depending on the database you are using, if you are looking for raw speed, you should check out the tech.ml.* and dtype-next Clojure scientific computing stack. It seriously doesn't get any faster. For instance: https://github.com/techascent/tech.ml.dataset.sql

respatialized17:10:54

I'm pretty familiar with TMD; the final load is not nearly as much the bottleneck as the initial IO and the processing that happens before the final load into the DB.

stopa17:10:23

Hey team, would love some help thinking through some async programming too. Context: • A user can call transact with different app-ids • I can paralellize by app-id • But within an app I need transactions to be processed in serial Potential solution • When a transaction comes in • I get-or-create the appropriate “app queue” and worker for this transaction • I add the transaction into the “app queue” • The worker does it’s magic For example:

(defn spawn-transactor [app-id]
  (let [q (LinkedBlockingDeque.)]
    {:q q
     :worker (future (loop []
                       (let [item (.take q)]
                         (log/infof "%s on %s" app-id item))))}))

(def transactors (atom {}))

(defn transact [{:keys [app-id] :as tx}]
  (let [_ (swap! transactors update app-id
                 (fn [old]
                   (or old
                       (spawn-transactor app-id))))

        q (get-in @transactors [app-id :q])]
    (.add q tx)))

(comment
  (transact {:app-id "foo"
             :ok :dok}))
The problem: Swap. This could “spawn” multiple workers when inside the CAS. I would have no way to clean up the “future” calls here, which could have a memory leak (if I understood correctly) Question: Would you write this differently?

hiredman17:10:04

swap in a delay and after the swap force the delay

stopa17:10:46

Why does the delay solve the CAS issue?

hiredman17:10:00

a delay doesn't execute until forced

hiredman17:10:23

so only the winner of the cas will run

stopa17:10:51

I am having trouble imagining the solution. Do you mind sketching out what you mean?

hiredman17:10:33

(swap! a update-in [:foo] (fnil identity (delay (some-expensive-thing-or-whtaver)))
(force (get-in a [:foo]))

🤯 1
stopa17:10:44

I am not sure this solves the problem I am thinking about. That problem stated another way: If I write:

(swap! a (fn [old] (or old (do-side-effect-thing-and-return-new))))
I know ^ is a bad idea, because do-side-effect-thing-and-return-new could be invoked multiple times. I want it be invoked only once. --- Do you see what I mean, or do I misunderstand what you mean? — Thank you for taking the time!

hiredman17:10:47

using a delay like I suggest means it won't be invoked more than once

stopa17:10:51

I see what you mean. I don’t know if this will solve the problem completely though. Consider: Imagine if transact runs on two different threads Both transacts will start a swap with a delay Both swaps could then see old being nil, and actualize a delay

Ben Sless17:10:57

Depending on what framework you want to buy into,isn't it essentially a group-by? https://github.com/leonoel/missionary/blob/master/src/missionary/core.cljc#L773

hiredman17:10:55

they will both get the same delay

hiredman17:10:08

and delays can only be forced once

stopa17:10:19

how would I go about giving them both the same delay?

hiredman17:10:28

the code I have above does it

hiredman17:10:43

the key is you force after the swap

stopa17:10:17

Hooly jeez. I see

stopa17:10:19

That’s genius.

stopa17:10:03

@UK0810AQ2 something along those lines, though I am aiming to avoid core.async / friends, and stick to clojure + java.util.concurrent if I can

hiredman17:10:54

atoms also directly support compare-and-set! which is more primitive than swap! and can sometimes help for this kind of thing

Ben Sless17:10:48

No reason you can't copy the idea / implementation

❤️ 1
stopa18:10:01

For sure, good point!

pppaul20:10:17

that's a very sexy use of fnil

Annaia Danvers17:10:43

Is there any way I can filter out NUL characters in a string (as in Unicode 0, \0 etc.)? I have a file that somehow has one and an API that won't let me upload, and despite trying every way I can think of to feeding a pattern to clojure.string/replace, nothing seems to match on it, and it's possibly illegal in any case. #"\0" is the only one that didn't silently fail and it throws this error instead:

Type:     java.util.regex.PatternSyntaxException
Message:  Illegal octal escape sequence near index 2
\0

kwladyka18:10:57

I would first check if it is really a \0. Load a file and try to find this character.

kwladyka18:10:32

(seq string) should show each characeter

kwladyka18:10:43

At least this is my expectation for your use case

chucklehead18:10:42

(clojure.string/escape (str "test" \o000 "2") {\0 ""}) seems to work

👍 1
Annaia Danvers19:10:57

hmmm.

(->> string
        seq
        (filter #(zero? (int %)))
        count)
gets me a 0.

Annaia Danvers19:10:27

which may mean Atlassian's API is just lying to me, somehow failing in its own special way

Annaia Danvers19:10:09

2022-10-14T19:05:42.554Z ip-192-168-1-112.ec2.internal ERROR [user:273] - Repo api-gateway failed with error: com.atlassian.confluence.api.service.exceptions.BadRequestException: Error parsing xhtml: Illegal character (NULL, unicode 0) encountered: not valid in any content
 at [row,col {unknown-source}]: [86,148]

Annaia Danvers19:10:03

curiouser and curiouser ... I can find it in the file. The error's off by 1 but it's there alright, shows up in VSCode, but it's like Clojure can't see it?

Annaia Danvers19:10:15

wait, now that it's zoomed, that's a bell character, not a NUL

👍 1
🔔 1
Annaia Danvers19:10:21

Atlassian is misreporting