This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-10-14
Channels
- # announcements (7)
- # aws (1)
- # babashka (1)
- # beginners (19)
- # calva (9)
- # clj-commons (4)
- # clj-kondo (64)
- # clj-on-windows (27)
- # cljsrn (12)
- # clojure (127)
- # clojure-bay-area (3)
- # clojure-europe (25)
- # clojure-hungary (7)
- # clojure-nl (1)
- # clojure-norway (9)
- # clojure-spec (5)
- # clojure-survey (2)
- # clojure-uk (22)
- # community-development (5)
- # core-async (19)
- # cursive (29)
- # datascript (8)
- # events (1)
- # fulcro (2)
- # graalvm (3)
- # jobs (1)
- # lsp (155)
- # malli (18)
- # nbb (6)
- # off-topic (86)
- # pathom (2)
- # rdf (18)
- # re-frame (9)
- # releases (2)
- # scittle (24)
- # shadow-cljs (33)
- # xtdb (4)
SOLVED
Hello, friends. I’ve settled on maybe-m
from clojure.algo.monads
for composing big expressions where something could go wrong anywhere. I had a question from the examples
folder. I was given to understand that the right-hand sides of binding-form/monadic-expression pairs must be monadic values. Yet, in the example, aren’t x
and y
just ordinary values? Does domonad
automatically apply m-result
to them? This example works, but I’m sure I don’t understand why it works.
(defn safe-div [x y]
(domonad maybe-m
[a x
b y
:when (not (zero? b))]
(/ a b)))
https://github.com/clojure/algo.monads/blob/master/src/main/clojure/clojure/algo/monads.clj#L357 yeah
That kind of implementation is clever, but can result in oddities if you are nesting monadic values, m-result of nil is still nil
The Monadic context for maybe-m is any value. m-result is just the identity function. So any Clojure value is a valid maybe-m Monad.
So maybe-m
is basically some->
😆
(and this is partly why monads are so rarely used in Clojure)
At work we compose pipelines of functions that could "fail" and right now each function has the equivalent of:
(if (valid? input)
(process input)
input)
So I proposed a more general version of some->
where you pass a predicate in and that determines whether to continue with the pipeline or not: https://ask.clojure.org/index.php/12272/would-a-generalized-some-macro-be-usefulMy application is a recursive AST constructor, with level-crossing semantical constraints that could go wrong anywhere. I want to reject trees that have any problem, but I don't care what the problem is. I'll give some->
etc. a try to see whether they can save me some trouble. Thanks for the discussion!
(to be more precise, a random AST generator from spec.alpha.generators and test.check.generators)
@U04V70XH6 for consistency with things like as->
and cond->
shouldn't pred->
take the expr
first, then the pred
?
@U0P0TMEFJ Yes, probably. I figured the core team will "do the right thing" with it.
many thanks ... sorry for bothering you so early (just realised what time it is where you are) ... have a good day 😉
I ditched maybe-m
for explicit nil
punning and it simplified my code. tyvm.
Ya, pretty much maybe-m is just a monadic version of the some-> macro, though I guess it works more like a let, and there is no let-some macro. I think the lack of usefulness here is a bit due to algo-monad being barebones. I think catz and fluokitten would be more useful because they'd add monadic map, apply, fold, traverse etc. But also, there's no denying that dynamic types and macros together often solve a lot of the same problems.
either (which I am note sure if algo.monads has) is usually nicer for pipelines too, since you have either a value to process or error information to pass through
How does one suggest a fix of a possible typo on the Clojure and ClojureScript home pages? Down at the bottom (of both pages) in the blurb about Nubank, the first paragraph ends with “Cognitect, the consultancy behind the Clojure and the Datomic database” (emphasis added). That extra “the” before Clojure reads strangely, and the former newspaper editor in me wants to strike it. Or we could embrace it and start saying “the Clojure” for everything, but that doesn’t seem right. 😉
https://github.com/clojure/clojure-site and https://github.com/clojure/clojurescript-site -- see https://clojure.org/community/contributing#_editing_this_site for guidelines.
I can take care of that
Trying to design a client API using cognitect.anomalies for errors, and have some questions about what things would translate to what anomalies:
• What would be examples of errors/exceptions that would translate to cognitect.anomalies/conflict
? Things like entity-id/primary-keys uniqueness conflicts? Or would those be incorrect
?
• Would a request that times out *obtaining a response (*e*onnection succesful*le as conflict? (when “Coordinate with callee” might be necessary, as the request might have actually reached destination but that is unknown) or this would also fall under unavailable
?
• If the client is disconnected from internet, is that also unavailable
?
• Rate-limit rejections would fall under busy
or forbidden
?
Any thoughts on this would be appreciated, thanks!
https://github.com/cognitect-labs/anomalies ? Last commit was 4 years ago. Just noticed.
I know this is not a answer for your question, but I think native Java logs in Java 11 and later are really good. My personal opinion.
Personally I like to log as JSON instead of string. Then I can do whatever I want in such structured logs.
I mean you can make your own rules, names etc. JSON can be easy searchable, so you can set alerts, charts etc.
Not sure logging solves what I am trying to achieve, as I want a program to be able to dispatch/react to a specific error on call site, while logging would be more of a fire and forget thing that one can react on a different place
np, anomalies is more like exceptions and http error status, it tries to categorize different possible errors so that you can dispatch only on those anomaly categories, simplifying error handling. But because it’s an “abstraction” over other error primitives, you need to translate specific exceptions/error-codes/api-errors/etc… into anomalies, thus my question abouch specific translations
heh still I have some issue to understand how to use it http://clojure-liberator.github.io/liberator/tutorial/decision-graph.html - maybe this help?
that graph translates to http error codes, I would need something similar but for cognitect.anomalies 🙂
I would use ::anom/conflict
for cases like where (e.g.) a client tries to upsert a record, but the record already exists, with different values than those specified by the client. The client call isn't invalid (so not ::anom/incorrect
), but the server also can't continue due to the conflicting data.
I have a queue-shaped problem I’m trying to think through. I have three “work streams” that have the following relationship:
• stream 1: needs to be processed sequentially
• stream 2: processes the output of stream 1, can be parallelized
• stream 3: processes the output of stream 2, can be parallelized
What should I be looking for to implement this behavior? core.async
channels? A ForkJoinPool? I’m pretty inexperienced with async programming so I’m having trouble evaluating my options. Any pointers for the direction I need to go would be super helpful.
Is this a batch operation or a streaming operation? Could be as simple as
(defn doit [data]
(->> data (map stream1) (pmap stream2) (pmap stream3)))
If using core.async, have a look at https://clojure.github.io/core.async/#clojure.core.async/pipeline for stream 2 and 3. For stream 1 probably a normal go-loop (or thread if computationally expensive/IO). I’d say core.async is ok for this task as long as you don’t need observability of the queues, otherwise core.async doesn’t help you there
don't use pmap for this unless the computations are very expensive and non-blocking
Stream 3 blocks (db transaction), Stream 2 does not - pure Clojure computation, probably the most expensive operation defined in the scope of my program. Stream 1 doesn't block but it's IO -each item is a separate file.
i have used manifold for stuff like this https://github.com/clj-commons/manifold it's code is pretty easy to understand, docs are ok. there used to be examples/tutorials, not sure where they ended up though. no matter what tool you use, the streaming part is going to be a pain in the ass to debug compared with regular clojure code, so try to use as little as possible, and if you can avoid it altogether.
is stream3 a pure sink, or is it doing some DB queries and then you take that result and do something downstream with it?
Thinking about this a little bit more, it could perhaps be simplified to: • Stream/queue 1 - process files sequentially • Stream/queue/batch 2 - grab as many items as are available to process from (1), process them, then transact. Could be multiple items or just 1 (isolation of transactions is not important)
@U3BALC2HH pure sink, the output of this ETL is a queryable database but further processing isn't part of the "job"
I think the database limitations about number of concurrent connections here are most important, but it sounds like you could get away with dumping chunks of data into a worker pool the performs transactions
Naively,
(doall
(for [chunk (map stream1 data)]
(future
(stream2 chunk))))
but there are slicker ways to accomplish the above
sounds like you want the last stream processor to do batch operations, probably with a timer to force op when batch can't get enough items in time
you could probably do this without streams, and use something like chime to do a job/task workflow. that would be a lot easier to debug
I like the job/task approach, although that seems like a lot of work to setup for a one-off job. I don't know how easy chime is to use
woooo that's pretty great!
will have to check that out
This job needs to be pretty durable (batches can last several hours or more), so the overhead of chime is probably worth it. The throughput of stream 1 is the primary limiting factor.
i did a streaming backend on one project, and i ran into a lot of problems because of it. streams are not free. the stream code is tiny and does pretty cool things, but then you have to interface with it, and you have to always know if you are dealing with a stream, a deferred, or normal clojure stuff and it's taxing and debugging is different for each scenario
Over in the data engineering department on the zulip chat, they recommended core.async's https://clojuredocs.org/clojure.core.async/pipeline -- if the results of stream 1 need to be in order (credit @UQ58AKW0J)
I am using safely
for controlled retries of calls to external resources and it gives me observability with mulog
for free, which has helped with debugging a lot - would probably help with the streaming context, but I am also totally ok with a scheduler as simple as “while there are still items to process, grab as many as you can every n seconds”
ok, if you are primarily dealing with batches, then you lose a lot of the benefits of streams. at least a full stream solution. if you know when you need to start processing data, and this job doesn't really interface with the rest of your app, you are in a very nice place. you can treat the job as a black box, use streams or whatever inside it, or something else.
I do worry about synchronizing the throughput of the scheduled job to the throughput of stream 1 though
curious what the issue with the throughput here is, it sounds like this could be drip fed
i think tuning is going to be something that you'll have to do profiling on. a process can't really know how long something will take to do without heuristics, it could profile itself to make those, though.
I think it sounds more like you're concerned about some brittle I/O, retries, and restarts (all valid concerns) than raw ingestion speed, am I right?
@U0LAJQLQ1 yeah, I think the general idea was to use streams inside of the batch job so I could process the data concurrently with the IO (which is error prone and rate limited)
if things are happening on the same CPU, and there is other stuff on the system (web app, db, whatever) then you probably don't want to do things very fast. and if you do want to do things as fast as possible, you may need more than 1 computer, and now you have a harder problem to solve
streams don't help you much with errors, you are going to be in a stream context for error handling, which is a bit different from normal clojure (same issue in JS if you have done any of that).
I’m still very much following Frank McSherry’s http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html of “If you are going to use a big data system for yourself, see if it is faster than your laptop” so distributed computation is way out right now.
but, you may have a stream fn that respects that rate limit, so that could be a benefit. i think you would have to do some testing to see how your streams interact with each other, and figure out the buffer sizes you want, and also figure out some rules you may want for connecting the streams together
I may be using the term “stream” too loosely here; what I really mean is “lazy sequence of items to fetch and then process”, which may differ from the sense in which you’re using it
we still don't know much about your problem. it could be the case that clojure isn't even a good tool for this, and something like command line tools, or kafka or whatever is better.
@UFTRLDZEW if you really want to dig into this hard, plenty of folks willing to nerd out with you in the https://clojurians.zulipchat.com/#narrow/stream/151924-data-science/topic/Data.20Engineering on Zulip, because what @U0LAJQLQ1 said may be correct and this may be even beyond the scope of Clojure
so, they are like mini computers with a set of memory (buffer), and I/O to other streams
I love that quote!! Now we just need to throws some VC money at it and it will now be SaFaaPaaS ... Streams are Functions as a Process as a Service!
if you can build your batch as a series of threaded functions, then you have a blueprint to make your streams, once you have your streams you can use extra features to do tuning. https://www.youtube.com/watch?v=1bNOO3xxMc0 you may be interested in this talk to get an idea of what streams give you over a series of functions. the main issue is going to be error stuff, so if you can split error handing into it's own functions, and then streams, you'll be able to avoid a lot of pain.
Dunno what database your sink is but if you're trying to max the throughput of a single box, you might need some kind of "copy into". Postgres lets you stream data right into a table using this approach. Most databases have support for bulk copying.
Great point @U065JNAN8. Depending on the database you are using, if you are looking for raw speed, you should check out the tech.ml.* and dtype-next Clojure scientific computing stack. It seriously doesn't get any faster. For instance: https://github.com/techascent/tech.ml.dataset.sql
I'm pretty familiar with TMD; the final load is not nearly as much the bottleneck as the initial IO and the processing that happens before the final load into the DB.
Hey team, would love some help thinking through some async programming too.
Context:
• A user can call transact
with different app-ids
• I can paralellize by app-id
• But within an app I need transactions to be processed in serial
Potential solution
• When a transaction comes in
• I get-or-create the appropriate “app queue” and worker for this transaction
• I add the transaction into the “app queue”
• The worker does it’s magic
For example:
(defn spawn-transactor [app-id]
(let [q (LinkedBlockingDeque.)]
{:q q
:worker (future (loop []
(let [item (.take q)]
(log/infof "%s on %s" app-id item))))}))
(def transactors (atom {}))
(defn transact [{:keys [app-id] :as tx}]
(let [_ (swap! transactors update app-id
(fn [old]
(or old
(spawn-transactor app-id))))
q (get-in @transactors [app-id :q])]
(.add q tx)))
(comment
(transact {:app-id "foo"
:ok :dok}))
The problem:
Swap. This could “spawn” multiple workers when inside the CAS. I would have no way to clean up the “future” calls here, which could have a memory leak (if I understood correctly)
Question:
Would you write this differently?(swap! a update-in [:foo] (fnil identity (delay (some-expensive-thing-or-whtaver)))
(force (get-in a [:foo]))
I am not sure this solves the problem I am thinking about. That problem stated another way: If I write:
(swap! a (fn [old] (or old (do-side-effect-thing-and-return-new))))
I know ^ is a bad idea, because do-side-effect-thing-and-return-new could be invoked multiple times.
I want it be invoked only once.
---
Do you see what I mean, or do I misunderstand what you mean?
—
Thank you for taking the time!I see what you mean. I don’t know if this will solve the problem completely though.
Consider:
Imagine if transact
runs on two different threads
Both transacts will start a swap with a delay
Both swaps could then see old
being nil,
and actualize a delay
Depending on what framework you want to buy into,isn't it essentially a group-by? https://github.com/leonoel/missionary/blob/master/src/missionary/core.cljc#L773
Thanks @U0NCTKEV8!
@UK0810AQ2 something along those lines, though I am aiming to avoid core.async / friends, and stick to clojure + java.util.concurrent if I can
atoms also directly support compare-and-set! which is more primitive than swap! and can sometimes help for this kind of thing
Is there any way I can filter out NUL
characters in a string (as in Unicode 0, \0
etc.)? I have a file that somehow has one and an API that won't let me upload, and despite trying every way I can think of to feeding a pattern to clojure.string/replace
, nothing seems to match on it, and it's possibly illegal in any case. #"\0"
is the only one that didn't silently fail and it throws this error instead:
Type: java.util.regex.PatternSyntaxException
Message: Illegal octal escape sequence near index 2
\0
I would first check if it is really a \0. Load a file and try to find this character.
hmmm.
(->> string
seq
(filter #(zero? (int %)))
count)
gets me a 0.which may mean Atlassian's API is just lying to me, somehow failing in its own special way
2022-10-14T19:05:42.554Z ip-192-168-1-112.ec2.internal ERROR [user:273] - Repo api-gateway failed with error: com.atlassian.confluence.api.service.exceptions.BadRequestException: Error parsing xhtml: Illegal character (NULL, unicode 0) encountered: not valid in any content
at [row,col {unknown-source}]: [86,148]
curiouser and curiouser ... I can find it in the file. The error's off by 1 but it's there alright, shows up in VSCode, but it's like Clojure can't see it?
Atlassian is misreporting