Fork me on GitHub
#clojurescript
<
2022-04-04
>
isak16:04:13

Here is something I wish ClojureScript libraries would put in the readme: About how much it will contribute to the bundle size (e.g., the number that we'll see for the library in the shadow-cljs build-report). Because even though we have dead code elimination, a lot of libraries only have a few main entry points, so the bytes required typically won't vary that much. Maybe it is possible to do this via some kind of github action that can be shared, like the 'tests passing' type of tags you see sometimes.

p-himik16:04:06

Don't think it's feasible. As you say, it depends on the API surface of the library. Some libraries do provide just a few entry points, but some provide many. Also, it depends on the particular version of shadow-cljs (and, consequently, on CLJS and GCC versions).

isak16:04:41

For the surface area, I think even giving a range (if that is all that makes sense) would be useful. Or maybe something like minimal usage vs typical usage vs deep usage. Also I think for the libraries in the project that I work on, almost all of them are basically "all or nothing", or have a very common usage pattern that will be applicable to most apps. For example with reagent, I expect most apps use a similar subset of it. For the version of shadow-cljs, my guess is usually doesn't change to radically with each little version. And as long as the version is specified, that is ok, no?

p-himik16:04:52

How much of the effect there is depends on what changes have been introduced to CLJS and what exactly the library uses. E.g. seems like now there's an issue with DCE'ing top-level maps with string keys or something like that. If a library wraps 90% of its functionality behind such a map and you don't use it, it won't be DCE'ed in CLJS v1.11.Today but might get DCE'ed in CLJS v1.11.Tomorrow, resulting in 90% size decrease. I forgot to mention - apart from the versions and the usage, the resulting size also depends on your shadow-cljs config. At the very least, on the output feature set, but there are definitely other flags that affect it.

isak17:04:19

For me the important thing would be to be able to compare libraries. For example, routing library A uses 10 KB optimized with this version of shadow-cljs, and library B uses 50 KB, that is relevant information even if it may change tomorrow. It may even encourage authors to be conscious of CLJS constructs that are too expensive in terms of output size.

isak17:04:14

As for the shadow-cljs settings, just keeping it consistent seems like it would help there.

p-himik17:04:48

I fail to see how that information could be useful given that: • Library A might be doing its analysis with shadow-cljs v1 whereas B uses shadow-cljs v2 • Today one is 2x the other, tomorrow this changes with a new shadow-cljs version The probabilities of those are likely not that high. But I have a very strong suspicion that it would be as robust as comparing just raw loc.

isak17:04:13

I see no reason to let the project decide what version of shadow to use. If they don't work with the latest version of shadow, that is generally a problem they solve quickly.

isak17:04:43

Also this assumption is mistaken, because there are many ways to do things, and some are DCE friendly (or even output-size friendly) and some aren't. You gave an example earlier. > But I have a very strong suspicion that it would be as robust as comparing just raw loc.

dnolen17:04:27

@isak to avoid a misunderstanding, entry points really are not relevant for DCE

dnolen17:04:47

Closure DCE is at the level of functions and properties

isak17:04:47

@dnolen Well I don't mean build configuration entry points, I mean how people usually use the library. For example, in sci, I assume it would be sci.core/eval-string

dnolen17:04:28

but of course that is not a good example for DCE? or a reasoning about how it works

dnolen17:04:56

if you have a library w/ 1000 utility functions and you use one and there no interesting dependencies between most functions, i.e. well partitioned

dnolen17:04:09

then the fact there is a common entry point is irrelevant - DCE works

dnolen17:04:25

but an evalutaor for a language ...

dnolen17:04:42

you don't even know what the input is going to be - so what are you going to eliminate?

isak17:04:25

Right, so for the grab-bag of utility function case, this would be basically useless. I've just observed that many of the libraries I use only have a handful of things that I use, and I think that use is typical too.

isak17:04:15

Examples would be reagent, instaparse, re-frame, most routing libraries

isak17:04:46

So it isn't the case that we just have absolutely no idea how many bytes those libraries will tend to contribute to your bundle

phronmophobic17:04:06

I think the idea is: 1. Bundle size is an important metric for web applications 2. The libraries you choose and how you use them can have an impact on bundle size because of 1 and 2: 1. What methods currently exist for evaluating libraries based on bundle size? 2. What methods could be possible given some effort? 3. Can we automate parts of the process for evaluating how library choices and usage affect bundle size?

isak17:04:34

Yea exactly. I think code size is an important metric when choosing a library, not just features, but it is a little bit hidden right now.

dgb2317:04:03

Just to note, for webpack there is a visualizer for bundle sizes. It also show dependecies. I use it to see how stuff should be split and which libraries tend towards bloat. I don’t particularly care about this in non public pages/sites though.

p-himik17:04:01

You mean, you can use it on libraries without installing and using them?

p-himik17:04:20

Well, then shadow-cljs has a build report that displays all the sizes. :)

dgb2317:04:42

Ah... i didn’t really understand the problem sorry 😅

lilactown17:04:50

https://bundlephobia.com/ provides the ability to analyze npm libs

lilactown17:04:07

not sure if it takes into account tree shaking/DCE that some modern tools do

lilactown17:04:56

ah it does give analysis of the size of individual exports, e.g. https://bundlephobia.com/package/[email protected]

🆒 1
dgb2317:04:38

That heavily depends on the structure of the code, such an analyzer would have to be pretty involved.

p-himik17:04:13

I might be wrong but seems like Bunlephobia uses a naive approach of building one exported function and then measuring the output - there are a lot of things there that have the same exact size.

dgb2317:04:56

That’s my impression too

lilactown17:04:52

it seems that each exported fn shares code with most of the others

p-himik17:04:35

So the Exports Analysis panel is kinda useless.

dgb2317:04:52

It also breaks sometimes

lilactown17:04:37

i mean, it shows you how much it would add if you only imported one fn, and then you can treat the total bundle size as an upper limit

lilactown17:04:53

it's not totally useless but not as accurate as one would prefer IMO

lilactown17:04:26

doing combinatorial imports is probably just not worth it for this level of analysis

dgb2317:04:05

Actually upper limit is a pretty good metric to make ballpark estimates. I do that with memory/latency/storage estimates too. Better err on the heavy side.

p-himik17:04:13

Alright, I'll expand my "kinda useless" into "useful if and only if you gonna import just one function". :D

p-himik17:04:46

Or, I suppose, when the difference between the total bundle size and the total size of the functions that you're importing is significant. But at that point there's so much manual work, that it's IMO easier and absolutely robust to just import what you need, build it, and see the result.

lilactown17:04:45

there's a fast yes and a fast no you can get with this • "yes, the total bundle size is in my budget" • "no, importing even one of these fns is outside my budget"

☝️ 1
1
lilactown17:04:31

you can kinda squint and make some guesses about things in between but yea, to get accurate answers you just gotta try it

p-himik17:04:46

Makes sense.

thheller17:04:31

IMHO the only reliable metric you can get is from actually using the lib in a project and looking and the build report 😛

thheller17:04:05

also macros make this a whole lot harder. things like core.async can be totally reasonable or go full bloat if you go too much code

phronmophobic17:04:05

So to answer the questions: current method: do all the work upfront opportunities for improvement? no can we automate part of the process? no

thheller17:04:43

my recommendation: make a plain empty build. make a build report. import the lib, without actually using it. make a build report. compare. sometimes that already gives a good indicator how DCE friendly a lib is. then start using it one form at a time or whatever and observe.

phronmophobic18:04:51

It seems like you could automate some or even most of this process.

thheller18:04:02

I know a few people that create shadow-cljs build reports as part of their CI process yes

👍 1
isak18:04:36

I think you guys may not have measured different libraries that do similar things and have very different sizes (e.g., 10 kb vs 50 kb) with similar functionality usage. Your opinion would be different if you had.

thheller18:04:56

but a number in a readme might still be useful over all. at least makes the lib author think about this before releasing 😉

p-himik18:04:36

@isak I have - using a build report. :)

isak18:04:40

And yea I agree with the testing with no usage thing

isak18:04:52

@smith.adriane Is that what you believe?

phronmophobic18:04:10

Sorry, I was trying to summarize the suggestions written so far.

phronmophobic18:04:36

Intuitively, it seems like there should be at least some automatic method for creating metrics that help you estimate how a library (or functions(s) from a library) would affect bundle size.

1
isak18:04:52

Here is another way to look at the problem. Let's say if I find 10 open source ClojureScript applications on github, and do a build-report for all of them. Some of the libraries will overlap. Will I find 0 patterns for the bundle-report per dependency? I think you guys must know that isn't true.

isak18:04:33

Or that I plan to use reagent, and here are 10 open source apps that use it, and how much it contributed to their bundle. That tells me absolutely nothing about how it will contribute to my bundle. (This is what you'd have to maintain)

isak18:04:40

The bag full of utility function case, or macro heavy libraries are probably important exceptions, but it doesn't mean it can't work for other types of libraries (arguably more typical).

thheller18:04:45

I just recommend regularly reviewing build reports. that'll give you the most reliable info and early enough feedback.

thheller18:04:52

making a decision to use a library or not just based on size probably is not the best strategy

thheller18:04:12

there are after all not 50 different cljs libs all doing the same thing 🙂

thheller18:04:43

some may trade off larger size for performance or more features, DX, etc.

isak18:04:50

Yea that is not what I'm advocating though, but I think it should be a factor. Right now, making it a factor is costlier than it should be.

isak18:04:22

Maybe not 50, but there are a ton for some things like routing. Also, for other things it isn't comparing alternatives, just being aware of the cost. For example, today I looked at sci and that was about 600 kb optimized (pre gzip). There probably is no competition for sci, but it is still good to be aware of the cost of all that great functionality.

thheller18:04:43

well yeah there should definitely be a warning for large libs like that 😉

borkdude18:04:47

SCI yields about 150kb gzipped and can be further reduced when compiled without docstrings/arglists/misc other meta, but yes, it's good to be aware of bundle size when it's important.

1
1
thheller18:04:20

still nowhere near the size of full self-hosted though. so if you need eval, even a limited one sci is great 🙂

1
borkdude18:04:48

and you can also still advance compile all of your other libs along with it

borkdude18:04:23

The shadow-cljs build report is a great tool for inspecting where bundle size comes from

phronmophobic18:04:14

It seems like one option would be to find all the shadow-cljs projects on github and generate build reports for all their builds. This should give at least some interesting results for the most popular libraries.

rayat18:04:23

To clarify, putting aside implementation effort, a primary argument against publishing bundle size estimation ranges is that it's low value/accuracy, versus manual build reports tailored to your particular config etc, correct? But for proponents of that argument, what are your thoughts on this comment from above? > there's a fast yes and a fast no you can get with this > > • "yes, the total bundle size is in my budget" > • "no, importing even one of these fns is outside my budget" There's a chance that the minimum bound is inaccurate and could benefit from better DCE from shadow versions etc. But at least the maximum bound, using the least amount of dce etc, retains its utility as a fast yes, no? I could imagine github actions could perform the build reports using some hello world, and iterate through shadow versions or other parameters. Once these actions mature and become generalized, the implementation burden of these estimations becomes very low - we don't need to ask devs to remember to do manual build report permutations every patch, nor do we need some central repository or website (assuming maintainers get on board with this though i guess)

1
p-himik19:04:28

> But for proponents of that argument, what are your thoughts on this comment from above? IMO it's not worth doing, unless somebody wants to do it. > But at least the maximum bound, using the least amount of dce etc, retains its utility as a fast yes, no? How often do you need a fast yes for a new shiny library in CLJS? My personal answer is "eh, once, maybe twice a year". Because I already have a set of libraries that satisfy my needs, except for rare occasions. Like the discussed routing library above - usually you don't switch between them between projects. You select it once, get comfortable with it, and use it everywhere. On a timeline of years, saving 1 minute of creating a manual build report won't gain anything.

1
isak19:04:11

If or how often you change it is a function of how hard it is to evaluate and get information about alternatives, and that is what the proposal means to make easier. For example, I would consider changing routing libraries from bidi (or bide) to reitit, but last time I checked the code size made it a no-go for me. Maybe that has changed and now would be a good time to change, but I'm probably not going to take the time to check it every month - it is too much of a hassle to do manually.

1
phronmophobic19:04:24

> 1 minute of creating a manual build report. How do you do that [in 1 minute]?

🙂 1
isak19:04:22

It isn't necessarily something you only need to do once. If you checked something 2-3 years ago, it may be very different now.

phronmophobic19:04:59

If it's possible to do in 1 minute, then it seems possible to automate for most or all shadow-cljs projects

p-himik19:04:49

> If or how often you change it is a function of how hard it is to evaluate and get information about alternatives There are ~5 routing libraries for CLJS. Alternatives appear once every few years. > Maybe that has changed and now would be a good time to change In a world driven only by the bundle size - maybe. In my projects, while bundle size does matter, spending time on things more important for business is, well, more important. Do I spend a few days rewriting the routing to a new library and then re-testing everything while increasing the probability that something will go wrong - all just to save 50 kB - or do I address a user need? In this context, the answer is obvious. If e.g. bidi fits my needs and has reasonable bundle size, as I have confirmed by a manual build report that took me a minute or so to make, I'll stick to it - there's 0 reason to even think about switching to something else. It's just time wasted. > Maybe that has changed and now would be a good time to change Like Thomas has described above - add a library to your dependencies, require it (I usually add some example code from the README or something like that), and run the release build with a report (a separate alias for that). > If you checked something 2-3 years ago, it may be very different now. It will not make bidi less satisfactory for me, so why bother? But you also wrote it yourself now - the timeline is years.

1
isak19:04:49

Yea, the thing is some of us work on the same codebase for a very very long time. We are not constantly hopping from one project to another, like you might if you are a consultant. This means just letting things calcify and never re-thinking things isn't necessarily the best strategy. But that is of course another discussion, and if you actually believe that I can see how something like this would have 0 value for you.

lilactown19:04:05

i think it's prudent to do an audit every few months, but different businesses care more about their bundle size

p-himik19:04:32

@smith.adriane It's possible to do it in one minute for personal needs, by a person - because a person knows what they need. It's hard to automate copying a random code block from a README. A maintainer might do that, sure - to come up with a minimal project of a predefined format that would use the latest version of shadow-cljs and their library and that would run on some CI and publish the report somewhere and all those nice things. But each maintainer would have to do that for their library. Without it, the metric dumbs down to "it's either completely DCEd or no".

p-himik19:04:00

An audit of a full project does make perfect sense, yes.

isak19:04:53

I would have no problem reading 50 READMEs and maintaining a few variations of requires and statements per library to get a few samples for bundle sizes, even if that is what it would take to get it going initially. It doesn't need to be some automatic readme parsing thing.

phronmophobic19:04:32

Both example projects and tests could also be inputs for automated bundle size estimates

phronmophobic19:04:32

Additionally, any open source project that uses a library could be useful as an example. The more popular the library, the more example projects there will be.

1
1