Fork me on GitHub
#clojure-dev
<
2018-08-31
>
gfredericks11:08:12

I don't know what this unreachable athrow is about, but it makes me happy byte alignment?

bronsa12:08:40

the clojure compiler at times emits unreachable code because it doesn't do too much analysis for dead branches

gfredericks12:08:22

o_O this seems clearly unreachable without thinking about branching it all

gfredericks12:08:50

it seems like you'd have to go out of your way to put it there

bronsa12:08:10

@gfredericks if you give me the code it should be (relatively) easy to figure out why it's there

gfredericks12:08:42

(if (.equals '~name 'clojure.core) 
  nil
  (do (dosync (commute @#'*loaded-libs* conj '~name)) nil))

gfredericks12:08:13

it's just an if essentially

bronsa12:08:46

are you sure it's that bit? I don't get that bytecode

gfredericks12:08:09

here's the whole method

gfredericks12:08:18

public static void load();
    Code:
       0: getstatic     #10                 // Field const__0:Lclojure/lang/Var;
       3: invokevirtual #16                 // Method clojure/lang/Var.getRawRoot:()Ljava/lang/Object;
       6: checkcast     #18                 // class clojure/lang/IFn
       9: getstatic     #22                 // Field const__1:Lclojure/lang/AFn;
      12: invokeinterface #26,  2           // InterfaceMethod clojure/lang/IFn.invoke:(Ljava/lang/Object;)Ljava/lang/Object;
      17: new           #28                 // class user/empty_ns$loading__6515__auto____297
      20: dup
      21: invokespecial #31                 // Method user/empty_ns$loading__6515__auto____297."<init>":()V
      24: checkcast     #18                 // class clojure/lang/IFn
      27: invokeinterface #33,  1           // InterfaceMethod clojure/lang/IFn.invoke:()Ljava/lang/Object;
      32: getstatic     #22                 // Field const__1:Lclojure/lang/AFn;
      35: checkcast     #35                 // class clojure/lang/Symbol
      38: getstatic     #38                 // Field const__2:Lclojure/lang/AFn;
      41: invokevirtual #42                 // Method clojure/lang/Symbol.equals:(Ljava/lang/Object;)Z
      44: ifeq          52
      47: aconst_null
      48: goto          67
      51: athrow
      52: new           #44                 // class user/empty_ns$fn__299
      55: dup
      56: invokespecial #45                 // Method user/empty_ns$fn__299."<init>":()V
      59: checkcast     #47                 // class java/util/concurrent/Callable
      62: invokestatic  #53                 // Method clojure/lang/LockingTransaction.runInTransaction:(Ljava/util/concurrent/Callable;)Ljava/lang/Object;
      65: pop
      66: aconst_null
      67: return

gfredericks12:08:38

(I'm sure enough to bet $10 on it)

bronsa12:08:46

yeah ok that does look like it :)

bronsa12:08:07

and I retract my comment about it being relatively easy to figure out :P

😂 4
bronsa12:08:22

it's not immediately clear to me why that athrow is there hmmm

bronsa12:08:15

that kinda looks like there would be an empty finally block

gfredericks12:08:42

there's no exception table on the method

gfredericks12:08:50

the dosync part is handled by calling runInTransaction

bronsa12:08:57

I'm stumped, I don't think the clojure Compiler emits that, it might be ASM under the hood to make stack frames depth match but that's just a guess

gfredericks12:08:45

ah ha, so I was right about byte alignment, sort of.

gfredericks12:08:00

52 is a multiple of 4

bronsa12:08:23

seems to only happen in 1.10

bronsa12:08:37

so yeah the new ASM is causing that

bronsa12:08:10

no idea why

bronsa12:08:41

odd, nice find

bronsa12:08:56

@ghadi any ideas why ASM would do that?

gfredericks12:08:49

does it potentially happen for all ifs?

bronsa12:08:43

seems to only happen when we take the unboxed bool path for test

gfredericks12:08:28

"The Unboxed Bool Path" would be a great name for something

bronsa12:08:26

an alan watts lecture

😛 4
gfredericks18:08:38

is there any particular reason for the use of $ instead of . in the generated class names? Especially the first one, which seems to intentionally result in a package per parent namespace instead of a package per namespace

dpsutton18:08:16

Outer.Inner versus Outer$Inner?

gfredericks18:08:11

that's the thing I'm asking about, yes

mikerod18:08:43

$ is following the classnaming spec

gfredericks18:08:01

in a way that . is not?

gfredericks18:08:12

example: (ns foo.bar) (class (fn yolo [])) returns foo.bar$eval325$yolo__326

gfredericks18:08:23

I can imagine maybe foo.bar.eval325$yolo__326

gfredericks18:08:30

where you have a package per namespace

gfredericks18:08:13

is the only thing at play here how many packages you end up with, or is there some other aspect I'm not aware of?

mikerod18:08:56

Oh, in that example, I don’t know a good reason

mikerod18:08:21

I can see your point I think, your imagined example seems to reuse the package “foo.bar”

gfredericks18:08:17

I'm not sure what you mean by "reuse"

mikerod18:08:08

well, just it’d be repeated

mikerod18:08:20

nothing more than that

Alex Miller (Clojure team)18:08:22

$ is the jvm separator for inner classes

Alex Miller (Clojure team)18:08:02

so the foo is the package, bar is the outer class, eval325 is an inner class of bar, and yolo__326 is an inner class of eval325

mikerod18:08:27

Yeah, that makes sense to me. foo is the package, since the last segment (`bar`) represents a class.

gfredericks18:08:53

but why generate that class instead of the class eval325 in the package foo.bar?

gfredericks18:08:04

there's a grouping decision being made here, I'm assuming -- if I have namespaces foo.bar, foo.baz, yolo.thomas and yolo.wizard, with whatever number of functions in them, in the current world by compiling these I get two packages -- foo and yolo. In an alternate reality that I'm asking about, you'd get four packages, one per namespace

gfredericks18:08:26

it surprises me that we're in this world and not the alternate one

mikerod18:08:55

It’s not clear what the pros/cons to either would even be

Alex Miller (Clojure team)18:08:59

I think the grain of this is just easier in the classloader

hiredman18:08:20

because there is an assumption that clojure namespaces map to java packages. when in fact they map to classes

mikerod18:08:26

makes a flatter directory structure I guess

Alex Miller (Clojure team)18:08:27

if I see namespace foo.bar, that means - go load some clj or class file

Alex Miller (Clojure team)18:08:31

where should I find it?

Alex Miller (Clojure team)18:08:59

if it’s a clj it could be foo/bar.clj - what else would make sense?

Alex Miller (Clojure team)18:08:30

and it’s sane to make the class path the same foo/bar.class

gfredericks18:08:40

yep; I see it now

Alex Miller (Clojure team)18:08:56

packages (especially way back in Java 1.5 or whatever) were not reified in Java

Alex Miller (Clojure team)18:08:30

they are now (kind of) to serve as a place to define various things

gfredericks18:08:49

that makes sense; thanks

gfredericks18:08:40

I guess you could also argue that you get a similar situation to java where there's not a 1 to 1 relationship between files and packages

gfredericks18:08:48

there are fewer packages than files

gfredericks18:08:39

I got stuck thinking about namespaces as groups of things, just like packages; but the other way of thinking of them is that they're groups of things, just like classes

Alex Miller (Clojure team)21:08:48

its aggregation, not containment

Alex Miller (Clojure team)21:08:58

(although AOT stretches that a little)

csm22:08:18

howdy! I get a “Pushback buffer overflow” using clojure.tools.reader.edn/read, with a clojure.lang.LineNumberingPushbackReader. It looks like https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LineNumberingPushbackReader.java#L37-L39 doesn’t pass size to the super constructor, so the pushback buffer is always 1, per https://docs.oracle.com/javase/8/docs/api/java/io/PushbackReader.html#PushbackReader-java.io.Reader-

ghadi22:08:45

@csm what is the input that generates the exception?

csm22:08:48

it’s a largish clojure file, which I can’t share

csm22:08:45

some background: I had a unit test file that was getting unreasonably large, so I split it into multiple files and used load to load individual files; one of the files had a syntax error, but nothing seemed to point to the line where the error was. I tried clojure.tools.reader.edn/read et al to try and debug this easier.

ghadi22:08:41

this sounds like a bug, but I'm wondering if I can reproduce it with a smaller example.

ghadi22:08:21

do you know of a way to make a smaller example or sanitize/deidentify the large input you already have?

andy.fingerhut22:08:15

I was curious for a test case exhibiting the problem, too, because I'm pretty sure many people have used that code for years without hitting that exception.

andy.fingerhut22:08:35

Not to say there is no bug, but it makes me wonder what about the file contents is causing that.

andy.fingerhut22:08:07

e.g. can we prove that 2 is always enough, or indeed any finite value, or is any such choice just kicking the can down the road?

csm23:08:57

I think the bug is that size isn’t passed to the PushbackReader constructor. Or possibly there should be another parameter for the pushback buffer size (since the size argument is passed as the buffer size for LineNumberReader).

csm23:08:25

also, if there is a good way (say in emacs) to anonymize a file, that would help with a test case

andy.fingerhut23:08:45

My belief is that the size parameter says how many "unread" method calls one can make in a row on the same PushbackReader, before you must either do a read, or else you get the "pushback buffer overflow" exception.

andy.fingerhut23:08:35

Even if the size parameter were passed, that just means it might be possible to pass down a size other than the default of 1. Looking through some of the EdnReader code in Clojure, I haven't yet found a code path where it obviously can call unread twice in a row.

andy.fingerhut23:08:52

Also, when I follow this link that you gave, I see a call to the superclass constructor that does pass the size parameter: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/LineNumberingPushbackReader.java#L37-L39

andy.fingerhut23:08:28

or rather, it is passing it to the LineNumberReader constructor. I guess your point is that it isn't also a parameter to the super() call?

andy.fingerhut23:08:49

I am not aware of an anonymize capability in Emacs.

andy.fingerhut23:08:45

I have sometimes have reasonable success in reducing test cases in size by cutting it in half, then trying the first half, then the second, then repeating. I realize that might not work as easily for nested Clojure data structures

csm23:08:49

right, it doesn’t pass it to super. I can try to get a test case together.

csm23:08:13

but LineNumberingPushbackReader feels broken to have no way to specify a larger pushback buffer size than 1

andy.fingerhut23:08:37

It might be broken. But suppose we pass it 2 and it still crashes? Then 3? Then it doesn't crash with 27. Do we commit 27 into Clojure and hope for the best, or maybe figure out why 1 doesn't work?

csm23:08:48

let the user decide what pushback buffer size?

csm23:08:29

(again, this was when trying to use tools.reader, not clojure)

andy.fingerhut23:08:02

The exception was with tools.reader library's edn reader? Or the one built into Clojure?

andy.fingerhut23:08:25

Those are different code bases.

andy.fingerhut23:08:12

and as far as I can tell, the code you pointed at is using the size parameter to pass it on to construct a LineNumberReader with the same pushback buffer size.

csm23:08:38

tools.reader’s. I know they’re different code bases (I added the dependency for that lib to my project…)

csm23:08:51

I think I got mixed up, I thought the LineNumberingPushbackReader was an implementation of the protocols in clojure.tools.reader.reader-types

csm23:08:33

but no, the pushback buffer size is always 1, no matter the arg to LineNumberingPushbackReader — IMO the call may have been intended to be super(new LineNumberReader(r, size), size)

andy.fingerhut23:08:36

I did some quick REPL experiments with read/unread method calls on a LineNumberingPushbackReader, and it does seem that 2 unread's in a row raises an exception even if you call the 2-arg constructor with a size of 2.

andy.fingerhut23:08:34

Or perhaps even that call should be super(new LineNumberReader(r), size)

csm23:08:15

and again, this isn’t a problem with clojure itself (that I can tell); and it may not be worth changing

csm00:09:44

a cursory look at EdnReader and LispReader seems to indicate this can’t be a problem for clojure itself. I’ll back away slowly now…

andy.fingerhut00:09:14

It could be a bug in tools.reader. I'm taking a little bit of a look, but the code has enough cases in it that a test case to reproduce would make it a lot easier to find the problem than auditing the whole code base.

andy.fingerhut00:09:01

Was the exception message you saw perhaps "Pushback buffer is full" rather than "Pushback buffer overflow"? I see the former in the tools.reader code base.

andy.fingerhut00:09:01

I'll let you know if I think of anything else, but if you get a file you can share, plus version of tools.reader lib you are using, and which Clojure expressions you used to open the file, read it with tools.reader, and the exception you saw, that would be ideal for further debugging.

andy.fingerhut05:09:48

I dug for a while through tools.reader edn reader, at least the Clojure/JVM version, and have found a case where I can make it throw an exception with "Pushback buffer overflow" in the message: If you create a clojure.tools.reader.reader-types/indexing-push-back-reader from a file, and the file contains a sequence like "[a" then a carriage return character with no newline, then "b]".

andy.fingerhut05:09:27

If your file has carriage returns in it that are not followed by a newline or form feed character, and have a symbol, or probably one of several other things, immediately after the carriage return character, that might be the root cause of what you are seeing.

andy.fingerhut05:09:39

One workaround is to replace all carriage return chars with newlines.

andy.fingerhut05:09:27

well, or at least the ones that are not inside of strings, where you may wish to preserve them there, if you do not want to modify the contents of those strings.

andy.fingerhut05:09:50

I've created an issue with a small reproducing test case here: https://dev.clojure.org/jira/browse/TRDR-54

4
💥 4
andy.fingerhut18:09:17

@csm If you can check whether your file that causes the problem contains some value like a symbol or number, immediately followed by a carriage return character (ASCII code decimal 13), and that is not immediately followed by a newline (ASCII decimal 10) or a formfeed (ASCII decimal 12), that would match the kind of problem I have found from code inspection and experiments. If you do not find something like that, you may have a distinct bug from the one I found.