Fork me on GitHub
#dev-tooling
<
2023-09-29
>
flowthing07:09:19

So I've been using clojure.pprint/pprint for my pretty-printing needs so far, but it's really quite slow. I can't use Fipp because I don't want to take any dependencies (and because of e.g. https://github.com/brandonbloom/fipp/issues/37), so I thought I'd have a go at making my own pretty-printer. What I have so far is dozens, sometimes hundreds of times faster than clojure.pprint/pprint, and 12x-15x faster than fipp.edn/pprint at Fipp's own benchmark. (It also allocates 10-25x fewer bytes than clojure.pprint/pprint or fipp.edn/pprint.) The implementation (https://github.com/eerohele/tab/blob/b5f1c0dd86349d05184d59746a017d9ed27e852d/src/tab/impl/pprint.clj) is ~200 lines of code. There are some benchmark results here: https://github.com/eerohele/tab/actions/runs/6348406766/job/17244962062#step:7:365 One tradeoff is that it's not customizable like clojure.pprint or Fipp, but that's not something I really need. Anyway, I figured I'd throw this out there in case anyone needs something like this, or if anyone has any ideas for improving the current impl.

nice 1
pez09:09:04

This is great! I’ll take a look and see if it’s something Calva can make use for. Anything in there that might not work in ClojureScript?

flowthing09:09:47

It depends. I've only ever used JVM-hosted ClojureScript, and I pretty-print ClojureScript evaluation results on the JVM side, so this should work there just fine. It won't work with self-hosted ClojureScript as is, but it shouldn't be too difficult to adapt for that, too, I think. I don't currently use ClojureScript myself, so I don't have an incentive to look into that much at the moment. 🙂

👍 1
jpmonettas12:09:30

nice codebase also, thanks for sharing!

flowthing05:09:17

Good questions! The most visible difference to clojure.pprint/pprint, I think, is the way reader macros are printed in some cases. tab.impl.pprint/pprint prints them the same way as fipp.edn/print:

user=> (tab.impl.pprint/pprint #'map)
#'clojure.core/map
nil
user=> (clojure.pprint/pprint #'map)
#'clojure.core/map
nil
user=> (fipp.edn/pprint #'map)
#'clojure.core/map
nil
user=> (tab.impl.pprint/pprint '#'map)
(var map)
nil
user=> (clojure.pprint/pprint '#'map)
#'map
nil
user=> (fipp.edn/pprint '#'map)
(var map)
nil
I think this accounts for most cases where tab.impl.pprint/pprint prints differently than clojure.pprint/pprint.

flowthing06:09:24

Fipp doesn't always stay within the margin. For example:

user=> (fipp.edn/pprint {[] [-1000000000000000000000000000000000000000000000000000000000000000N]} {:width 72})
{[] [-1000000000000000000000000000000000000000000000000000000000000000N]}
user=> (tab.impl.pprint/pprint {[] [-1000000000000000000000000000000000000000000000000000000000000000N]} {:max-width 72})
{[]
 [-1000000000000000000000000000000000000000000000000000000000000000N]}
nil
I don't know of any cases where tab.impl.pprint/pprint blows past the margin.

flowthing06:09:52

Other than those things, I'm sure there are tradeoffs I'm not yet aware of.

flowthing08:09:21

I actually did not know that clojure.pprint also supports formatting code via (clojure.pprint/with-pprint-dispatch clojure.pprint/code-dispatch ...). 🙂 That tab.impl.pprint obviously doesn't.

flowthing18:10:36

FWIW, I did a bit more work on this... tab.impl.pprint now prints reader macros the same way as clojure.pprint and supports *print-namespace-maps*. It now supports all of the same clojure.core/*print-* options as clojure.pprint (that is, all of them except *print-dup*). Through generative testing and comparing the output of tab.impl.pprint and clojure.pprint printing clojure.core var sources, I'm now fairly confident that the only place where tab.impl.pprint prints differently than clojure.pprint are cases where clojure.pprint doesn't make full use of the line width even though it could (as well as one meaningless difference in where each decide to insert a line break to avoid blowing past the margin).

cfleming01:10:38

Have you also compared the output to fipp?

flowthing07:10:07

I have. Here's a diff comparing the output of Fipp to the output of tab.impl.pprint printing the sources of all clojure.core vars: https://gist.github.com/eerohele/82d2ac3719fbd7de8d36ac154a7829bd The main differences are that Fipp prints (var foo) and tab.impl.pprint now prints #'foo, like clojure.pprint, as mentioned above (would be easy to make configurable). The other differences are cases where Fipp either prints past the margin or doesn't print until the margin even though there's space to do so, as well as Fipp having a different preference for printing map entries with multi-line values (all of the {:inline ...} cases in the diff).

cfleming10:10:07

Very interesting, thank you. Cursive currently uses fipp for its pretty printing, and I’d love to be using something which is more understandable and doesn’t go through the intermediate document formatting object step. The differences there are interesting, did you have a minimum width set? e.g. I’m unsure why for the eduction var, pp makes the choice to insert a newline after the :indent and before the (fn. It looks like in that case, fipp decides to defer the line break to later, whereas pp has a longer line including the 'clojure.core/unchecked_int_remainder later on.

flowthing11:10:02

pp doesn't have know concept of minimum width. I don't know about Fipp -- I used Fipp's :width argument for that diff. Fipp doesn't have a lot of docstrings, so I don't know all the options Fipp supports. pp inserts a newline after :indent because it determines that it cannot print the entire (fn ...) expression that follows on the same line as :indent without any line breaks. That's also how clojure.pprint works. Looks like Fipp makes the determination on a line-by-line basis somehow. pp prints the expression with 'clojure.core/unchecked_int_remainder on one line simply because it fits within the 72 character limit. I think Fipp inserts a line break because it has fewer characters available to print the expression: Fipp prints quote instead of ', and it has the (fn ...) expression starts on the same line as :indent.

cfleming11:10:28

Yeah, fipp clearly has less available for the later line because it doesn’t break the line near the start like pp does.

cfleming11:10:52

Aesthetically I think I like fipp’s choice, but obviously they’re basically equivalent.

cfleming11:10:57

BTW this is one of my all time favourite articles, in case you haven’t already read it: https://journal.stuffwithstuff.com/2015/09/08/the-hardest-program-ive-ever-written/

flowthing11:10:00

Sure. I could definitely look into whether I could make pp work like Fipp in this regard, but I don't really want to make the algorithm much more complicated.

flowthing11:10:16

I haven't read it yet, but I've seen it recommended elsewhere before. Thanks, I'll definitely check it out. :)

cfleming11:10:46

Yeah, I think fipp is considerably more complex, and the simplicity is really attractive from an ongoing maintenance point of view. I’ve never had to customise fipp, but I wouldn’t know where to start if I had to.

flowthing11:10:53

I think pretty-printing is way easier than formatting proper, though.

cfleming11:10:22

Sure, and it all depends how strict you want to be about the line length limitation, too.

flowthing11:10:34

And pp very much relies on the formattee being a Lisp, which definitely helps. 🙂

flowthing11:10:12

I stumbled upon this comment earlier in Fipp's issues:

flowthing11:10:18

> Use the Fipp engine, but a custom Edn printer. This is the approach that @cursive-ide chose, as they output IntelliJ display objects instead of text.

flowthing11:10:26

So I don't know anything about IntelliJ display objects, obviously, but I pp has so little code that I imagine it wouldn't be prohibitively difficult to adapt the algorithm to output those instead of strings, but not sure, of course. 🙂

cfleming11:10:36

Yes, that’s right. fipp has two parts, the object is parsed and the layout decided, and the output of that is a series of formatting objects, Then in fipp proper those are printed, and in Cursive they’re printed using IntelliJ’s output functions (and highlighted etc).

flowthing11:10:50

I see, interesting. 👍

cfleming11:10:54

I wouldn’t call them display objects really, it’s more like “append this text to this editor, but in this style”, e.g. fg/bg colour, bold/italic, other highlighting like errors, hyperlinks etc.

flowthing11:10:20

Ah, I understand.

cfleming11:10:28

It’s complicated because Cursive also interprets ANSI escapes, which are a PITA.

flowthing11:10:39

Oh yeah, I've steered clear of those thus far. 🙂

cfleming11:10:35

Reading through those issues, it looks like you originally ditched fipp due to lack of print-method support. But print-method is often just incompatible with pretty-printing. Are you planning to support it, or did you just decide that support for it wasn’t something you needed?

flowthing11:10:59

pp uses print-method for everything except colls, basically.

flowthing11:10:50

(And pretty-printing is only really relevant for colls, I think.)

flowthing12:10:58

pp doesn't have the problem I referred to in that commit message where I ditched Fipp, for example:

user=> (cpp/pprint #time/date "2023-10-02")
#time/date "2023-10-02"
nil
user=> (prn #time/date "2023-10-02")
#time/date "2023-10-02"
nil
user=> (pp/pprint #time/date "2023-10-02")
#time/date "2023-10-02"
nil
user=> (fipp/pprint #time/date "2023-10-02")
#object[java.time.LocalDate "0x1375388b" "2023-10-02"]
nil
But it's entirely possible print-method has pitfalls I don't know about. I'd be very interested in hearing about them if you know of any. :)

bozhidar12:10:19

I’ll take a look at your implementation as well. Might be something we can bundle with Orchard for people who want to stay light on deps.

flowthing19:10:04

I think I'll end up pulling this thing out into a single-namespace, no-dependency lib so that folks can use it either by pulling a dep or by copy-pasting the namespace into their own codebase, so you might want to hold off on trying it out until I get that done.