This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-04-01
Channels
- # announcements (54)
- # asami (3)
- # aws (5)
- # babashka (8)
- # beginners (64)
- # biff (27)
- # calva (11)
- # cider (41)
- # clj-otel (7)
- # cljdoc (72)
- # clojars (20)
- # clojure (159)
- # clojure-austin (3)
- # clojure-europe (143)
- # clojure-italy (1)
- # clojure-nl (5)
- # clojure-norway (3)
- # clojure-uk (3)
- # clojurescript (19)
- # community-development (1)
- # core-typed (5)
- # cursive (3)
- # datalevin (1)
- # datomic (8)
- # emacs (13)
- # fulcro (4)
- # google-cloud (4)
- # honeysql (25)
- # java (1)
- # jobs (1)
- # lambdaisland (3)
- # lsp (121)
- # off-topic (52)
- # other-languages (1)
- # re-frame (3)
- # releases (2)
- # remote-jobs (1)
- # shadow-cljs (36)
- # sql (4)
- # xtdb (36)
Question: is it worth it to make a rope implementation that accepts non-string components? 🧵
I've been working on a rope implementation for a while now, and just because it was easy in the initial implementation I was allowing the leaf nodes to have either a string or a vector as the collection storing the objects.
The problem is this: I have one implementation that fits for both strings-as-ropes-of-characters, and ropes of general values. There's a number of conditions in the leaves based on the type stored there.
This isn't necessarily an issue, but if I make them two separate types, even though it makes it slightly more performant for a lack of conditionals, it would cause significant code duplication.
The one thing that would be nice about having them be separate is to allow them to have separate semantics when conj
is used, that is for string-based ropes conjing on a string would append the string, while for value-based ropes conjing on a string would add a string as one element, rather than each of the characters individually.
This use of conj
with a string to append the whole string is practically necessary because adding characters one by one to a string causes way too much copying.
I could perhaps solve this with a transient that causes the final node that gets appended to to have a StringBuilder as the leaf container.
What I have been considering at this stage is just removing the vector leaf container altogether, and I guess the question comes down to: is having a rope data structure for non-character sequences useful enough to bother with the implementation complexity?
I might be interested in a generic rope data structure. Basically, I've wanted to have a decent, performant data structure for text that allows for characters of different fonts, styles, slants, underlines, font sizes, etc.
There are various rope implementations available, but all the ones I've seen are for just sequences of characters.
yeah, that makes sense, so you'd basically want like a rope of maps with an actual string of text and other meta?
potentially with ropes for the actual strings stored too?
The liquid clojure library has a https://github.com/mogenslund/liquid/blob/master/src/liq/buffer.cljc#L13 which allows for a map of data for each character, but I haven't really delved into the trade-offs.
Yeah, so I have metadata on ropes, another way it could potentially be done is to just have a meta map with vectors of ranges to data
that would make the actual string manipulation more efficient, at the cost of making the meta a little more complex to handle.
But good to know there's interest in ropes for other types of sequences.
I'll try to figure out this API question then.
To be honest, I'm not sure what I want. I just know that membrane lacks good representation for stylized text.
fair enough
So I'm interested and would be happy to try things out. Not sure how helpful that is.
If you do solve the problem of how to represent stylized text in a performant way, I would be eternally grateful 🙏 😁
So the one real like api kink that I have at the moment is this:
(conj (rope) "hello")
;; => #rope [ \h \e \l \l \o ]
(conj (rope) [1])
;; => #rope [ [1] ]
which is distinct from
(concat (rope) (rope ["hello"]))
;; => #rope [ "hello" ]
This has a slightly odd signature for conj where it takes two collections and merges them into one, rather than adding the second collection as an element of the first.
Which isn't unheard of, maps do this.
And when making ropes specifically for characters it's practically necessary if I don't implement transients, because
(into (rope) "hello")
would be horribly unperformant if it's appending characters one at a time to a string. super memory inefficient> Which isn't unheard of, maps do this.
(conj {} [1 2 3 4])
Execution error (IllegalArgumentException)
Vector arg to map conj must be a pair
I don't think maps do that
(conj {:a 1} {:b 2})
;; => {:a 1, :b 2}
fair enough
but since this makes the version with strings inconsistent with the one with vectors, it means you can't write code against this rope implementation that's fully generic over what types are contained.
I guess I could work on the transient implementation and make it so that conjing a string on adds the full string as one element and enable you to use into
to efficiently copy a string into a rope this way, or perhaps just encourage using concat with another call to rope
I'm actually pretty surprised to see that concat returns anything besides a seq
this is a custom rope-only concat
I just didn't think another name fit
Is the performance difference when adding 1 character at a time because you're not pre-allocating?
I mean I can't preallocate with strings, but also it's all immutable
the issue is this code is horribly inefficient:
(-> "hello"
(str \w)
(str \o)
(str \r)
(str \l)
(str \d))
it copies the string five times
right, but you could have a character buffer
and just wrap with pointer to buffer + length
right but then that's mutable
which I don't want
I could do that with a transient rope.
But I'm not on to transient implementations yet.
The way to do this efficiently in the current system would be
(concat (rope "hello") (rope "world"))
which ends up having the same performance characteristics of
(str "hello" "world")
if someone gives you pointer to 10 length buffer with length 3, then the fact that you eventually add stuff to the buffer at positions 4-9 doesn't affect you
I thought that's how the some of the transient stuff worked under the hood generally
well transients don't allow you to save multiple iterations of edits on the same structure
I suppose that you're right about that for buffer appends, which helps in this case, but it does make splits a bit more complex.
yea, it prevents copies for some additions, and deletions from the end, but it is more complicated for splits
this is at the limits of my knowledge for these data structures
right, and splits are one of the main things that ropes are supposed to make efficient and relatively easy.
I am planning on making a form of transient structure that makes it efficient to append many smaller strings or collections onto a rope in sequence.
it would end up being a StringBuffer and a TransientVector at the bottom
if there's a spot for per character info, that would be right up my alley
Yeah, so in order to do that you would just make a non-string rope, gives you the same nice memory characteristics for having a sequence of them, and it's nicer than a vector for it because of the log time for subseq views, trimming out a subseq, splitting in two, etc. The elements would probably just be two-element vectors of character-and-meta.
I suppose at this late stage of development I should probably think about testing the actual performance characteristics of this structure as opposed to a vector, lol
Would the meta info be metadata? The reason I ask is that, for my use case, the info would ideally impact equality (as opposed to metadata)
It'd be better than a vector for character sequences, but for arbitrary values I should look at how it actually compares to just a vector.
Yea, I was just thinking that maybe I should just give the builtin data structures a go to see if that works.
It would not need to be meta, and actually couldn't be since characters arent IMetas
well the builtin structures definitely shouldn't be able to compete for strings
for arbitrary values maybe
yea, to be fair, I have been using liquid's buffers so far without issue, but I've only been using them very superficially for smallish buffer sizes.
actually, there's also primitive vectors. hmm.
Anyway, I'll have to do some testing to see how it compares.
very cool. out of curiosity, what are you interested in using them for?
I made the first version of them while in a voice call with another clojurist to show them how you could make a data structure with deftype
I genuinely might start making a text editor that's a cross of liquid and nightlight just to have an excuse to use them though.
If they're any better than vectors that is.
if not then this was just a fun side project that resulted in learning a lot more about the collection model of clojure
I think there are https://www.youtube.com/watch?v=uqKta5i7A9c&list=PLb_VRZPxjMADovzE7xYIzMr68BHXLVzH3in #visual-tools (including me) that would be interested in a clojure based editor.
yeah, liquid is definitely already that, but the thing that it isn't is embedded in your project. And while nightlight is embedded in your project, it's a webapp, and I consider that to be a big detriment to its usability.
Liquid is embedded in my projects.
wait really? I didn't realize you could embed it like that. Does it use the fact that it lives in the same jvm to do code completion etc? or does it use nrepl still?
I'm not sure. I don't currently have code completion for my uses, but I'm only using it very superficially.
aaah, okay
The basic liquid stuff is vim-esque, but I've been trying to use it in an emacs-esque way.
fair enough.
Ideally, there would be a clojure based code editor that does all the stuff codemirror does. I don't think it's quite there yet.
I have gotten good use of just basic liquid usage though.
What actual functionality does liquid provide as a library?
There's lots of stuff I'm not using, but the functionality I'm currently using is code highlighting, paren matching, text editing, and cursor navigation
Okay, took a chance to do some benchmarks, and the rope really showed its value at concats, but I already knew it'd beat out vectors there since the best way to do vector concat is with into
which is linear with the size of the second argument, versus the concat with ropes which is constant time.
Now I'm testing splits which I expect to be closer to evenly matched.
ah, that's good to know
oooh, okay, so apparently splits are not comparable across ropes of arbitrary values and primitive vectors. Now to see if that tracks across string ropes, and how non-primitive vectors measure up.
apparently the rope is slightly faster than a non-primitive vector
now to see how the string-based rope compares
I just want to share https://github.com/lacuna/bifurcan/blob/master/test/bifurcan/rope_tests.clj and maybe even https://github.com/clojure/core.rrb-vector in case it's not on your radar.
I'm also interested in making a clojure coding environment that works better for me. I feel like I've tried them all.
i might look at performance of those to compare, they're probably faster than what i've made. What I'm making is as much a learning tool for me as anything else, since I'd never dived this far into the collection model before.
I'm making it as full an implementation as i can though
today i got the charsequence impl in to allow clojure regex functions to work with it now.
As a dynamic language with a simple “code-as-data” syntax, reflection capabilities, and powerful macro/metaprogramming facilities, it seems like Clojure has what is needed for more sophisticated static code analysis and “semantic diffing.” For example, suppose I wanted to know how many times a particular function is used in a project. Instead of diffing an entire file against a previous version, it would be helpful to diff each function within the files based on their parameters, return values, and bodies. Imagine cloning all Clojure project repos from GitHub and analyzing the code to see which core functions are most frequently used, which libraries are most often used together in the same project (cool graph visualization!), commonly occurring code patterns/idioms in function bodies, semantically duplicate functions across projects, etc. It could make for a cool website to explore and learn about functions, libraries, “design patterns,” and idioms across all real Clojure projects on GitHub. It could be an excellent resource for newbies in particular! Is this feasible? I can’t think of any other programming language community that has this. Has somebody already tried to do something like this?
Probably the closest match was the cross clj site which cross-indexed all Clojure projects on GitHub. The author decided to stop running the site but the code is available if someone wants to relaunch it
There's also something here: https://github.com/borkdude/api-diff In general such diffs can be easily made using the static analysis of clj-kondo
I'd say grasp is similar to this for search
See here for the data specification of the clj-kondo output: https://github.com/clj-kondo/clj-kondo/blob/master/analysis/README.md
And then there are things like https://github.com/Datomic/codeq for semantic analysis of projects over time (all git history)
There is also https://github.com/Wilfred/difftastic and autochrome but those tools produce diffs on syntax, not semantics
Very cool; thanks for sharing! Now I have some reading to do. As I become more proficient with Clojure, I might try to solve a problem in this space to leverage my data science experience to benefit the community by improving the “getting started” experience. I feel like it’s a bit too advanced for my current skill level—some ideas are sloshing around in my mind for now.
I've been wanting to try out Codeq, but it sadly doesn't seem to be possible because Datomic Free is no longer available and as far as I understand, Datomic Pro doesn't support the free:
protocol.
You should be able to use Datomic Pro Starter and just change the protocol for connection afaik
Might even work with devlocal, can't say I've tried that though
Nah, I get java.lang.IllegalArgumentException: :db.error/invalid-storage-protocol Unsupported storage protocol [protocol=free] in transactor properties config/samples/dev-transactor-template.properties
. Codeq doesn't support the dev
protocol. Might be possible to rewrite it to do so, though, haven't looked into it yet.
there is actually a codeq2 that works with cloud from a couple years ago that no one ever got around to releasing. not sure anyone has the bandwidth to do so atm
As a clojure learner cross clj was very very helpful to me… +1 on any relaunch or similar thing… it would not be possible for me to do it yet.
Is there a better way to express “is there anything in this collection that isn’t nil”? (some some? col)
feels like it’ll draw funny looks
Not builtin, but here's a helper macro for this. Won't work if col is dynamic, ofc.
(defmacro or-some
"Evaluates exprs one at a time, from left to right. If a form
returns a non-nil value, or-some returns that value and doesn't
evaluate any of the other expressions, otherwise it returns the
value of the last expression. (or-some) returns nil."
([] nil)
([x] x)
([x & next]
`(let [or# ~x]
(if (some? or#) or# (or-some ~@next)))))
Real question about accessing record fields - which way does the community generally prefer?
(defrecord Foo [my-field]
...
(bar [this]
(let [x my-field] ...)))
(access the record field via the binding directly) or
(defrecord Foo [my-field]
...
(bar [this]
(let [x (:my-field this)] ...)))
(getting the value from the record like it's a normal map)I went with the former in our app recently since our team believes it's more idiomatic, but we could be wrong
In general one of the reasons for using records I hear alot is "records are more performant than regular maps" but I personally haven't seen a lot of benchmarks that actually support this.
So while I can see that "records are more performant" can be true, I'm skeptical as to whether it's actually true
(defprotocol IProto
[bar [_]])
(defrecord A [some]
IProto
(bar [this] (str some)))
;; final Object some = this.some;
(defrecord B [some]
IProto
(bar [this] (str (:some this))))
;; final ILookupThunk _thunk__0__ = B.__thunk__0__;
;; final B b = this;
;; Object o;
;; if (_thunk__0__ == (o = _thunk__0__.get(b))) {
;; o = (B.__thunk__0__ = B.__site__0__.fault(b)).get(b);
;; }
I'm still not sure if it'll be more performant in practice (the difference will likely be negligible compared to DB queries and network calls) but it's nice to see what happens under the hood
The difference between direct field access vs map access is minimal afair (I did some tests in the past). However if you define function inside defrecord
or deftype
fields are already in the scope. I don't see why you shouldn't use them directly.
the former is definitely the preferred use
4 days late, but I thought I'd add something here... I sort of do this in some places, where I want to share a function between records.
(defprotocol Thing
(read [this arg] "read the thing")
(write [this arg1 arg2] "write the thing"))
(defn read*
[{:keys [a b c] :as this}]
;; read some data with a, b and c
)
(defrecord RThing [a b c]
(read [this arg] (read* this arg))
(write [this arg1 arg2]
(throw (UnsupportedOperationException. "read-only"))))
(defrecord RWThing [a b c]
(read [this arg] (read* this arg))
(write [this arg1 arg2]
;; update the thing
))
It may be faster/better to pass the fields along to the common function, rather than the entire record, but I like this approach
Hi folks. Is there any way to preventing pretty printing to print commas between keys in maps? When I copy some map from the repl, I always remove the commas afterwards, so I’d prefer not getting them in the first place.
There's no easy way to do it, but you can create a new dispatch function that wraps clojure.pprint/simple-dispatch
and handles maps differently, and then pass that function to clojure.pprint/set-pprint-dispatch
. Alternatively, you can override the simple-dispatch
multimethod on clojure.lang.IPersistentMap
to do your own thing.
(loop [[x & ys] (range 10), !m (transient {})]
(if x
(recur ys (assoc! !m x x))
(persistent! !m)))
=> {0 0, 7 7, 1 1, 4 4, 6 6, 3 3, 2 2, 9 9, 5 5, 8 8} ;; clojure.lang.PersistentHashMap
(let [!m (transient {})]
(loop [[x & ys] (range 10)]
(when x
(assoc! !m x x)
(recur ys)))
(persistent! !m))
=> {0 0, 1 1, 2 2, 3 3, 4 4, 5 5, 6 6, 7 7} ;;<-- this is funny, does not grow past clojure.lang.PersistentArrayMap size
(->> (range 10)
(reduce #(assoc! %1 %2 %2)
(transient {}))
(persistent!))
=> {0 0, 7 7, 1 1, 4 4, 6 6, 3 3, 2 2, 9 9, 5 5, 8 8} ;; clojure.lang.PersistentHashMap
from cljs:
...
(if (<= (+ len 2) (* 2 (.-HASHMAP-THRESHOLD PersistentArrayMap)))
(do (set! len (+ len 2))
(.push arr key)
(.push arr val)
tcoll)
(assoc! (array->transient-hash-map len arr) key val)) ;; this is where it returns different obj
...
is there a randomness library for cljc, that takes seeds and generates consistent outputs on both platforms?
Probably https://github.com/clojure/test.check I did a very superficial check for longs and the results are the same for CLJ and CLJS.
But its usage might be unusual because the generator itself is immutable. This is the line of most interest: https://github.com/clojure/test.check/blob/master/src/main/clojure/clojure/test/check.cljc#L210
Gary did a talk about this area if you want to track it down
for future reference, it's actually not quite consistent out of the box because rand-long
generates a goog.math.Long
which has a typical javascript moment when you try to do stuff to it, so (mod (random/rand-long (random/make-random 1)) 2)
gives different answers on clj/cljs