Fork me on GitHub
#clojure
<
2020-02-27
>
noisesmith02:02:32

when it's a long day of reproducing and fixing regressions that were in master and you are feeling a bit punchy

(let [huge-payload (apply str (repeat 100 (System/getProperty "java.class.path")))]
  ...)

noisesmith02:02:04

I had to take some zeros off the repeat count, the first version crashed the integration test vm with an OOM

andy.fingerhut02:02:58

Wow. Do you recall how big (count (System/getProperty "java.class.path")) was?

noisesmith19:02:36

haha - I can go back and check - of course this was a corporate clojure service with a bunch of apache libs, the classpath was massive :D

noisesmith19:02:53

21031 outside the test runner, the test runner adds more

andy.fingerhut19:02:09

Not too short 🙂 Certainly repeating 10^6 of those in memory seems likely to be an OOM, but I'm surprised 10^3 would be, unless the process was already very low on memory.

noisesmith19:02:33

I meant I took zeroes off to end up with what I shared

andy.fingerhut20:02:05

Understood. Engineer here geeking out on the details, looking for performance problems/fixes 🙂

Alex Miller (Clojure team)04:02:11

You could use a promise-chan

Alex Miller (Clojure team)04:02:42

And ops on it in a go block

borkdude07:02:48

@michael.e.loughlin > I'm looking for Lisp implementations written in Clojure. Maybe sci is another one to look at. https://github.com/borkdude/sci/

borkdude07:02:42

It uses edamame to parse Clojure code, which is in turn based on tools.reader.edn

👍 4
mloughlin10:02:08

Edamame looks exactly like what I'm looking for, thanks!

zilti13:02:22

What would the idiomatic way of breaking out of nested structure be in Clojure? As someone who also uses Scheme, my first thought was "continuation", and exceptions are a kind of continuation, but "abusing" an exception as a continuation feels a bit wrong.

fricze13:02:23

What you mean nested structure? Nested data structure? Nested calls?

zilti13:02:09

Nested calls

zilti13:02:34

About four levels deep.

jumpnbrownweasel13:02:38

Yes, exceptions are usually used for errors. I think simply returning (terminating the expression) is normally used, which means it has to be done at all four levels. I realize that doesn't answer your question.

hindol13:02:35

In Clojure, it is more like,

(when (pred x)
  ;; Do something
)
If pred becomes false, when returns nil. You can propagate this nil upwards if you wanted to.

fricze13:02:37

@U2APCNHCN would care sharing what you’re code does? Is it data transformation? Some state changes, or async calls?

thom13:02:55

You also have the option of structuring this in terms of core.async channels perhaps?

zilti14:02:07

It is data transformation so to speak, yes. It's a web crawler. The problem is that one function relies on getting nil as content at some point, or it'll enter an infinite loop. But another now has to figure out if an URL is invalid, and that is where I want to jump out of the stack

hindol14:02:35

Can you structure this problem as multiple pools of workers? Like, one set of URL validators send valid URLs across a channel. Other pool actually fetches the content. I think Thom is trying to say the same thing.

fricze14:02:05

So… it seems like you’re designing some bigger process. Clojure doesn’t tend to jump out of stack. If passing data from function to function is not enough, core.async is, most often, way to go to design processes.

thom14:02:07

I think that would be a fine use of core.async. It’s just multiple queues of work (the list to be scraped, the found content, the list of potentially invalid URLs) you can separate out and which don’t then need to care much about each other.

zilti14:02:21

Well, I also have to be able to fetch pages during crawling, for paginated things for example

zilti14:02:33

So it isn't as linear

hindol14:02:10

The other approach will be to tag the invalid URLs and make the outer function ignore them.

hindol14:02:34

I am assuming the deeper nested one does the URL validation.

zilti14:02:49

The way it works now is that I have a crawler definition for each crawler that gets read by a macro. And that definition says crawl this webpage using these rules, and when there are multiple pages, here's how to fetch them

thom14:02:51

That just sounds like an inner loop finding more URLs to scrape, or scraping all pages at once and finding content of interest. Either way the output can just be pushed to a channel.

thom14:02:24

So a job might fail as a whole?

zilti14:02:00

Yes, a job might fail as a whole

zilti14:02:42

A crawler is basically a chain of main blocks, and each of these main blocks contain instructions on how to operate on the data

zilti14:02:59

Pretty much a DSL

thom14:02:18

So I think you have a fair few options here. Obviously you can use channels and just return some structured value that indicates it bailed out early. I actually don’t think exceptions would be inappropriate here though, given that you presumably mostly expect crawls to succeed. You can add extra info about errors or URLs encountered (using ex-info or whatever).

Crispin14:02:12

I know what you mean. There are some threading constructs like some-> and some->> that can help

Crispin14:02:48

but I must confess, I have used exceptions to break out. It does feel like abuse. I think the lack of continuations in clojure is a design decision by Rich. He mentions it in his interview with Eric Meijer.

zilti14:02:05

I think I also heard how the JVM simply doesn't have any kind of support for continuations and that would make it very hard to implement them

zilti14:02:50

Yea I guess at some point I'll have to refactor that whole thing

Crispin14:02:24

If it is "very hard" to implement them then that may be the reason, what with Rich's focus on simplicity.

Crispin14:02:49

automatic TCO seems like another one of these cases.

dominicm14:02:34

Simplicity is about interface, not implementation.

dominicm14:02:12

Dynamic vars are the solution for communicating across a stack. Dnolen has a continuation library. Not exactly idiomatic though.

thom14:02:21

What would be the need for continuations here though? Is there spectacularly complicated retry logic or something? Surely if there's a DSL you can push the orchestration wherever you like.

zilti14:02:31

Well, not the need so much as it being a possible way

Ben Sless14:02:08

Do you think this could be relevant to your problem? It'll only get in clojure 11 but you can just copy the funcion https://clojure.atlassian.net/browse/CLJ-2555 Also, while the JVM doesn't support TCO, you can cheat with trampoline https://clojuredocs.org/clojure.core/trampoline

zilti15:02:24

Hmm I'll look into this, but for now I used an exception

👍 4
jumpnbrownweasel17:02:36

This doesn't apply to most situations but I feel like I should mention it anyway: For performance critical code, using exceptions for control flow is a bad idea because the creation of the stack trace is relatively slow. Most code is not performance critical, so this doesn't apply and you should probably not worry about it. But in the Java world when writing performance critical code, this is a no-no. For situations where the exception is always caught and handled in a restricted scope, a workaround is to throw a static pre-created exception. This is an advanced performance topic and I can't imagine it being done in Clojure except in rare situations, but that is the story from the Java performance perspective.

✔️ 4
dominicm17:02:19

I thought the stackframes were reasonably cheap and that was from the early java days?

dominicm17:02:47

Maybe relatively is a better word. I guess they're expensive in real-time networking or something :)

jumpnbrownweasel17:02:28

Yes, relative is the key word. And my experience is in performance tuning for very high throughput operations. But even in Clojure, I wouldn't use exceptions for control flow primitives that might be used in high throughput code.

isak17:02:27

@UBRMX7MT7 do you know if it is still a perf problem if you have pre-created exceptions?

jumpnbrownweasel17:02:52

No, I just play it safe.

4
jumpnbrownweasel18:02:11

Sorry, I misread that. The try/catch mechanism is not slow. It is the exception creation that is slow. So with pre-created exceptions it is not a significant performance issue.

isak18:02:10

ah, cool :thumbsup::skin-tone-2:

jumpnbrownweasel18:02:42

But it's important to use the pre-created exceptions in very limited scope. You don't ever want that exception to escape and be caught by something that doesn't understand it and handle it. If you do, the stack trace will be printed and it will be non-sense, which can be very confusing when trouble shooting. In Java I only do this with checked exceptions, to be sure they don't escape to the upper levels. In Clojure you'd just have to be very careful that you have a try/catch at a higher level that can handle it correctly.

jumpnbrownweasel18:02:35

I would never throw such an exception from a library API, for example.

isak18:02:48

yea, makes sense

emccue01:03:40

Old thread, but you can try "rubbing some monads on it", if that makes sense

emccue01:03:41

which can give you an "early bail on failure" behavior

jumpnbrownweasel02:03:12

@U3JH98J4R I get the general idea, but more specially do you know of any monad-like things done in clojure that would work for situations like this? Or any examples that are close?

emccue02:03:03

Im thinking of one of the examples on this page

emccue02:03:55

;; ----------------------------------------------------------------------------
(def return
  (fn [_]
    (throw (UnsupportedOperationException.
             "return function used outside of supporting macro context"))))

;; ----------------------------------------------------------------------------
(defmacro allow-early-return
  "lets you write the given block of code with unnamed early returns
  in the form of a `return` function.
  Because the return mechanism uses Exceptions under the hood,
  catching the generic Exception or Throwable class will lead to undefined
  behaviour if done within a block that may call the return function.
  Ex. (allow-early-return
        (when (> x 10)
          (return false))
        (... some longer computation ...))"
  [& code]
  (let [block-name (gensym)
        ret-name (symbol (str "return-from-" block-name))]
    `(block ~block-name
            (let [~'return (fn [a#] (~ret-name a#))]
              ~@code))))

emccue02:03:10

though if im being honest, i too keep a macro in my back pocket for this

jumpnbrownweasel02:03:53

oh, it uses exceptions.

jumpnbrownweasel02:03:30

i'm done using hidden parameters of any kind or tricks, i'll just stick with arguments and return values.

jumpnbrownweasel02:03:42

compared to Java, it is just so easy to return a tuple (vector) from a function, or a map if the number of variants is more complex. i'm very happy with that.

jumpnbrownweasel21:03:44

or is another way to avoid nesting while creating a "pipeline" of bindings, using the example for letr:

(let [res    (network-call)
      err    (when-not (:ok? res) :failed-network-call)
      people (or err (:people (:body res)))
      err    (or err (when (zero? (count people)) :no-people))
      res2   (or err (network-call-2 people))]
  res2)
I know it's a little strange but to me it's clear.

4
4
borkdude15:02:00

I'd like some input on this issue: is try without a catch (and/or finally) probably a silly mistake or are there legit uses of this? https://github.com/borkdude/clj-kondo/issues/773

Alex Miller (Clojure team)15:02:15

no legit use I'm aware of

Alex Miller (Clojure team)15:02:02

it is an implicit do, but do would be preferred in that case

dominicm15:02:16

Given java doesn't allow it, my only other idea is to read the source and see what clojure generates.

Alex Miller (Clojure team)15:02:18

try itself doesn't really result in anything - it's the catch + finally that create exception handlers etc

Alex Miller (Clojure team)15:02:53

there are some impacts on like recur in try, but removing the try only opens possibilities

zilti16:02:16

*** %n in writable segment detected *** ... what can I do about that?

zilti16:02:47

I don't have any format calls or similar in my code.

zilti16:02:01

Ah, must be happening somewhere on the way to the database

zilti17:02:02

Well, that seems pretty much impossible to track down as to where I am supposed to catch that

dpsutton18:02:56

happy to help but don't have a clue what issue you are facing

p-himik18:02:21

It seems like the message comes from deep within glibc because of some possibly incorrect usage of snprintf. And seems like you cannot catch it or even suppress it. Sounds like some underlying dependency has this issue, maybe JVM itself.

eoliphant18:02:24

Hi, i’m trying to use create-ns to make ns’es like foo.bar mainly so that i can use them with namespaced keywords like ::fb/baz without creating actual files. Not sure of the best way to create aliases for them. require/use :as .. don’t work as those guys are trying load libs. do I have to use (alias ) explicitly or is there a way to do this via the ns macro

zilti18:02:47

@p-himik yes, that is what I assume right now... Kind of a bad situation to be in, tbh

p-himik18:02:17

Found this: https://stackoverflow.com/questions/3779278/problem-with-n-n-in-writable-segment-detected-c-i-qt You don't embed values into your SQL queries by hand, do you? :)

dpsutton18:02:55

is there a stacktrace?

zilti19:02:21

No, I don't do that, @p-himik, I use HugSQL and next.jdbc. @U11BV7MTK no stack trace at all

andy.fingerhut19:02:53

Does it print that message and the code continues to run, or does it abort the computation?

zilti19:02:38

It aborts the entire process

zilti19:02:17

No core dump, no error message except * %n in writable segment detected *` .

andy.fingerhut19:02:17

Any native libraries in your class path? I do not know much about them, but if you mention it is on the way to the database, then likely you are using some JDBC driver library, which might be partly implemented in Java, and partly in native code?

p-himik19:02:38

What does ulimit -c output?

andy.fingerhut19:02:40

With enough logging in your code, with appropriate flushing so that all of them get flushed out to a log file/service, and not lost in an in-JVM-process buffer when the JVM crashes, you could probably isolate it to which database call is the cause.

p-himik19:02:32

I just managed to reproduce the crash in a simple test app, dump the core, and debug it. I couldn't get Java symbols within GDB (please tell me if that's possible), but at least I can see the first C function that Java calls that ends up crashing the app.

p-himik20:02:02

TBH I went with this one far beyond my comfort zone. :) I've never even used GDB before.

zilti20:02:37

I'll have to look tomorrow what ulimit -c would output. No native libraries directly, but I am using the PostgreSQL JDBC library

andy.fingerhut20:02:51

Is it something you can reproduce 100% of the time, e.g. running a particular test? Or is it some "happens after weeks of running in production" kinds of things?

p-himik20:02:24

@U2APCNHCN Here's a good workflow for tomorrow, as far as I can tell. Sorry if it's too verbose - I just done it myself for the first time, and I have no idea what you might know already. 1. Run ulimit -c and see if it outputs 0 2. If it does not, search online for where core dumps are located on your particular system - there should be one 3. If it is '0', run ulimit -c unlimited - it will only work within current shell, so stay in this shell 4. Run your application within that shell 5. Reproduce the situation where the core is dumped 6. Run gdb <path-to-java> <path-to-core-dump-file> 7. Within gdb, run where At this point, you should see the native stack trace with a bunch of ?? after it. The question marks are Java calls, it's hard to get them to be displayed. But the native stack trace should at least give you an idea where to look further. Hope it helps.

zilti11:02:29

ulimit already was unlimied, and there are no core dumps, even though it now claims to write one (or is it in some special location?)

p-himik11:02:53

I can be in a special location, yes.

zilti11:02:24

The other place I have an error that causes the VM to segfault, it creates a core dump in the same directory I am in when I call the program.

p-himik11:02:10

Oh, OK. This one is strange. Does the first process have writing permission to the CWD?

zilti11:02:06

Yes. It is the same program in both cases, just the input it gets is different (it's a web crawler, and I give it pages to crawl). All other pages work. But my output from one page causes this "%n in writable segment detected" and makes the whole thing stop, and one other page causes a SIGSEGV.

p-himik11:02:32

Hmm. So in your particular case, the error somehow does not generate the core dump, but it does on my end. Albeit with a different program. Just to check yet another thing. What does the /proc/sys/kernel/core_pattern file contain? If you have it at all. In any case, since you know what input generates the error, can you try to find the %n string anywhere in that input or in anything that that input can produce?

zilti12:02:22

The file contains |/usr/lib/systemd/systemd-coredump %P %u %g %s %t %c %h

zilti12:02:37

Yea, I'll have to step-by-step go through this in a REPL I guess

p-himik12:02:05

One alternative - rename that file temporarily. It may be the case that systemd-coredump does not write the dump to disk in that particular case for some reason.

p-himik12:02:32

So, have you tried searching for %n in the input that gets parsed?

zilti14:02:29

No, but me and my coworker looked at the coredump file of the other error. We threw out runejuhl/clj-journal and that resolved the problem.

p-himik14:02:57

So maybe you can plug that library back in, set *strict* to true, and find out the stacktrace and where the bed values comes from.

zilti16:02:57

Interesting, but I think this would still cause the other error

cmcfarlen21:02:21

I have a clojure.data.xml question. Is there a way to forgo xml namespace prefixes when emitting xml? I have a situation where the xmlns has to be present, but the remote parser does not support prefixes. The only way I have found is to (with-redefs [ (fn [] nil)] (xml/emit-str ...))

Alex Miller (Clojure team)21:02:19

you can set a default ns at the root - have you tried that?

cmcfarlen21:02:23

I think I am setting the default using an :xmlns attr at the root

Alex Miller (Clojure team)21:02:15

if all of the elements are also in that ns, then I think the emitter should emit them w/o prefix

cmcfarlen21:02:39

Does that work even if I haven't called xml/alias-uri ?

cmcfarlen21:02:15

(->
    (xml/sexp-as-element [:a {:xmlns "my:ns" } [:b "whoa"]])
    (xml/emit-str)
    )

cmcfarlen21:02:30

"<?xml version=\"1.0\" encoding=\"UTF-8\"?><a xmlns:a=\"my:ns\"><b>whoa</b></a>"

cmcfarlen21:02:06

The remote parser is mad about xmlns:a instead of just xmlns

Alex Miller (Clojure team)22:02:52

are all of your elements in the same ns?

Alex Miller (Clojure team)22:02:14

if not, then I think the answer to your first question is no - you're asking to make an invalid document

Alex Miller (Clojure team)22:02:08

if yes, then set the default ns and put all your elements in that ns and the emitter shouldn't need to prefix them

cmcfarlen22:02:20

They are all in the same ns. Its emitting bare tags, but the xmlns attribute is still getting the prefix. so something like "<?xml version=\"1.0\" encoding=\"UTF-8\"?><a xmlns=\"my:ns\"><b>whoa</b></a>" works, but its outputting "<?xml version=\"1.0\" encoding=\"UTF-8\"?><a xmlns:a=\"my:ns\"><b>whoa</b></a>" (with the xmlns:a= )

cmcfarlen22:02:56

This parser is in an embedded device and it doesnt handle the prefix on the initial xmlns attribute on the root tag