This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-10-03
Channels
- # announcements (4)
- # aws (19)
- # babashka (55)
- # beginners (40)
- # biff (4)
- # calva (9)
- # cherry (3)
- # cider (8)
- # clj-kondo (26)
- # clj-yaml (3)
- # clojure (92)
- # clojure-austin (14)
- # clojure-europe (21)
- # clojure-nl (1)
- # clojure-norway (5)
- # clojure-portugal (3)
- # clojure-uk (2)
- # clojurescript (48)
- # conjure (19)
- # datalevin (14)
- # docker (13)
- # emacs (3)
- # fulcro (21)
- # gratitude (14)
- # improve-getting-started (1)
- # introduce-yourself (2)
- # joker (4)
- # juxt (2)
- # lsp (12)
- # malli (5)
- # meander (17)
- # off-topic (13)
- # re-frame (7)
- # scittle (2)
- # test-check (2)
How do you search "huge" (+100 keys) maps for terms? I think having some "intelligent" search fn available all the time would be pretty cool.
intelligent is in quotes because something like word distance and regex would probably go a long way. obviously this kind of thing already exists already and will be stronger then that though.
Not sure what the actual question is. If a search function allows indexing, then create an index and use it for search. If it does not allow indexing, just linearly scan the whole map.
Yeah, those are both options. I usually just scan the map. But it would be cool to have something like: (search m :hi) ;; didn't find ":hi" but found ":hello" did you mean ":hello"?
That is incredibly domain-specific. I don't think it's possible to generalize it in a meaningful way. Would it be cool though? Yes. But then, there's pretty much an infinite amount of things that would be cool. :)
Ill have to google "trie", but i get topN and weight functions. Yeah, ok cool. Thanks for the feedback 🙂
Trie is like a tiered prefix map A string is a sequence of characters. Imagine looking up on char at a time by by prefix. H, then e, l, l, o finds a word. If it doesn't find anything, it gives the top N from the node which failed
And here's where problems with domain specificity begin. :)
Tries are only suitable if you're searching by a prefix, right? But levenshtein(s, "hello")
is 1 for "mello"
. And what if you need to find a string that has the lowest distance to the query? Or a shortest string under some distance threshold? Or what if you need strict matches on Perl super-regex with their look{ahead,behind}? And a thousand more of "what if"s, which will all sound equally probable given the vagueness of the OP and follow-ups.
It could be a library I suppose
But definitely not a core language feature
Well "intelligence" is vague because it's hard to define. As an example of "not very intelligent" i usually just filter the keys using sting includes?. But what if it's upper case? well, maybe i make it case insensitive... and it becomes a bit smarter by being a bit dumber. I mean, i could search my hashmap with elastic search. 🔥 😜
It's not hard to define - it's impossible to define without knowing the exact domain. Searching for user queries on movie names can be as lax as possible - ignoring cases, diacritic marks, punctuation, multiple typos, missing words, etc. (And maybe that's exactly what you mean, but you haven't specified so. And even then, multiple loss functions are possible, with domain-dependent differences in how effective they are.) But e.g. searching for passport IDs must not allow for any mistakes.
Yep. yep. i agree.
You can use Lucene if you want Levenshtein distance and other fuzzy things: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html But this is mostly for strings
Since you specified 'huge' as +100 keys, and this would basically only be useful at the repl, building indices doesn't seem sensible. Here's how I'd do it, with 'levenshtein from clj-fuzzy
(defn search [m search-key] (if-let [v (m search-key)]
v
(reduce-kv (fn [[closest-key best-dist] k v]
(let [dist (levenshtein k search-key)]
(if (< dist best-dist)
[k dist]
[closest-key best-dist])))
[:err Long/MAX_VALUE]
m)))
or something like that- ugly to return 'v' in the exactly-found case
Thanks @U043HTJ9RC2 i'll look into clj-fuzzy and some other solutions. ill report back when i can 💪 .
Hello fellow clojurians, may your parens come in pairs and have a great new week! > Context: I'm writting & running this code as Babashka scripts (https://github.com/babashka/babashka) I hope you can help me with this small confusion that I'm having: I'm trying to pretty print a clojure map into EDN file, using
(spit "/tmp/question.edn" (with-out-str (clojure.pprint/pprint {:deps {'is.mad/some-lib {:mvn/version "2022.10.03"}}})))
And it works, producing the following EDN file
{:deps #:is.mad{some-lib #:mvn{:version "2022.10.03"}}}
BUT when I do it from a test, the version string "2022.10.03"
gets written to a file without surrounding double quotes (`"`), like the following:
{:deps #:is.mad{some-lib #:mvn{:version 2022.10.03}}}
> This causes an issue that afterwards the EDN can't be de-serialized (at least not that easily)
I haven't been able to isolate the issue. It happens deterministically (i.e. always). When I take the code out (from the required CLJ file) and run it in isolation, it works as expected (outputting the version string with surrounding double quotes)
Do you have ideas what could cause this behaviour? Is there some configuration I might have unknowingly set that can affect it? Any other approaches I could try to output formatted EDN to a file?
Many thanks in advance! Madis@U6P6QJTUZ The difference is probably:
$ bb -e '(binding [*print-readably* true] (with-out-str (clojure.pprint/pprint {:deps {(quote is.mad/some-lib) {:mvn/version "2022.10.03"}}})))'
"{:deps {is.mad/some-lib {:mvn/version \"2022.10.03\"}}}\n"
vs
$ bb -e '(binding [*print-readably* false] (with-out-str (clojure.pprint/pprint {:deps {(quote is.mad/some-lib) {:mvn/version "2022.10.03"}}})))'
"{:deps {is.mad/some-lib {:mvn/version 2022.10.03}}}\n"
Hey thanks for the response Borkdude, I just followed you today on Github. What a history of contributions you have to Clojureland! :star-struck:
I tried your recommendation and indeed - binding *print-readably*
to true
produces the double quotes as I was expecting. Problem solved.
Hey.
Is there a way to make clojure.pprint/pprint
care about *print-namespace-maps*
?
prn
seems to print it correctly but I also want a formatted EDN file.
I want to produce an EDN file that I'll share with other people and I would like to minimize the syntax where maps are prefixed. They'll need to edit it and it's best to keep it basic.
This works for pretty-printing but it doesn't care about *print-namespace-maps*
:
(println (binding [*print-namespace-maps* true]
(with-out-str (pp/pprint {:my/item :hi
:my/item2 {:nested1 :item
:nested2 :item
:nested3 :item
:nested4 :item
:nested5 :item
:nested6 :item}}))))
This cares about *print-namespace-maps*
but it's a blob:
(println (binding [*print-namespace-maps* true]
(with-out-str (prn {:my/item :hi
:my/item2 {:nested1 :item
:nested2 :item
:nested3 :item
:nested4 :item
:nested5 :item
:nested6 :item}}))))
It seems to me that it works as intended:
Clojure 1.11.1
user=> (require '[clojure.pprint :as pp])
nil
user=> (pp/pprint {:a/x 1 :a/y 2})
#:a{:x 1, :y 2}
nil
user=> (binding [*print-namespace-maps* false] (pp/pprint {:a/x 1 :a/y 2}))
{:a/x 1, :a/y 2}
nil
Is this an issue with bb perhaps? I just checked and there appears to be an edge case here:
user=> (binding [*print-namespace-maps* false] (pp/pprint {:a/x 1 :a/y 2}))
#:a{:x 1, :y 2}
I want to print the namespaces in the maps. This is what I don't want:
#:a{:x 1, :y 2}
This is what I want:
{:a/x 1, :a/y 2}
This is on JVM Clojure :thinking_face:I found my problem.
It's that I didn't understand the doc of *print-namespace-maps*
and set it into the opposite value. What a stupid one 😄
Wow 😄 Well... It's also possible that he fell for the same variable name as I did 😄
Does anyone happen to have an example of a complex edn or aero configuration? I'm trying to get an idea of how people write real world configurations, and what challenges there are.
Not sure what qualifies as complex
😅 but here is where i use Aero+Integrant in a mono repo setting not only to share code but to share configs as well:
Main repo: https://github.com/bob-cd/bob
component 1: https://github.com/bob-cd/bob/tree/main/apiserver
2: https://github.com/bob-cd/bob/tree/main/entities
3; https://github.com/bob-cd/bob/tree/main/runner
common/shared: https://github.com/bob-cd/bob/tree/main/common
the configs are in the resources and the initialisation is in src/component_name/system.clj
hope this is something useful? 😄
Thanks! It is
http://grep.app lets you search github for regexes. You can find some in the wild examples with https://grep.app/search?q=%5Baero.core%20%3Aas
That's also a good suggestion!
FWIW, when I moved to aero from DIY solutions my configuration complexity went down by a lot; #profile
especially helped with that.
Aero's custom tag literals are pure awesome. reduced a lot of my complexity too!
@U7ERLH6JX What do you use them for?
Mostly use the #env
, conditional ones like #or
for defaults and coercion ones like #long
. Aero does all the heavy lifting of making sure the config is right and coerced as expected with just declarative code.
Ah, you mean Aero's readers, rather than your own you defined for Aero.
declarative data in fact. so its easy to merge too
yeah the ones that come packaged. just called them that because they seem to call it like that? https://github.com/juxt/aero#tag-literals
havent gotten to a place to use my own, default ones seem quite good for my needs
I usually define just one custom tag - #dotenv
, almost the same as #env
but slightly different format and reads its value from the .env
file first.
we use aero a lot, we generate kubernetes manifests from edn/aero files, application configs, api descriptions & more
not terribly complex at all, but cljdoc uses aero, https://github.com/cljdoc/cljdoc/blob/master/resources/config.edn.
The best part of Aero for me, is that it disagrees with the 12 factor app. Instead of favouring environment configs. It favours defining the config for each environment in edn alongside your code stored in git, using only the environment variable to choose which profile to use when reading from your aero config. And for secrets, it correctly disagrees with 12 factor app as well, by advising against environment variables again, and for using secret files instead.
@U0K064KQV That sounds like a misunderstanding of the 12factor environment thing though, it even starts with
> An app’s config is everything that is likely to vary between https://12factor.net/codebase
That is, you’re not supposed to create a weird serialization format on top of environment variables with a bazillion keys and values, but more like pointers on where the important branching happens; usually this is stuff like where the config file is located, what the current environment is (which is convenient to pipe to #profile
selector with sensible default indicating localhost/dev) and maybe some other similarly high-level config variants.
Secrets handling has come further from the time 12factor was created though, eg. Kubernetes secrets are read-only mounted files in specific directory within a container. This gap in progress makes sense since 12factor is from era before orchestration, docker, or even services such as RDS.
So it’s not really that it “correctly disagrees”, 12factor is an artifact of its time and it’s curious how well it’s kept its premise even with all the progress the industry has done in the mean time 🙂
I was there in 2012 and disagreed then as well 😝, so I'm not sure it's a product of its time
But, I don't really want to debate the real 12 factor app, in a real Scottman way. I might be wrong and you right, maybe they meant something different, but I've seen many people take that advice to put their configs in environment variable for each key/value config they have that differ per deploy environment. And I don't think that's a good practice. I'm glad k8s are pushing people to rely on files instead, but that was always possible even without k8s.
How would one polymorphically extend the get
in function Clojure at compile time?
(defn get
"Returns the value mapped to key, not-found or nil if key not present
in associative collection, set, string, array, or ILookup instance."
{:inline (fn [m k & nf] `(. clojure.lang.RT (get ~m ~k ~@nf)))
:inline-arities #{2 3}
:added "1.0"}
([map key]
(. clojure.lang.RT (get map key)))
([map key not-found]
(. clojure.lang.RT (get map key not-found))))
get
delegates to clojure.lang.RT/get
but it's not obvious to me how to extend this behavior. My temptation would be:
(extend-type Tensor
clojure.lang.RT
(get [map key]
"blah"))
but I get
class clojure.lang.RT is not a protocol
it depends what you mean by all those things, but generally the way to support the get
function is to implement the clojure.lang.ILookup interface
Glanced at: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/ILookup.java Tried
(extend-type Tensor
clojure.lang.ILookup
(valAt [obj key]
"blah"))
and got interface clojure.lang.ILookup is not a protocol
am I getting warmer?ILookup is a java interface, and you can't extend java interfaces to classes (Tensor) not under your control
it's just something I made up.
ah, well you're in luck. deftype & reify offer you a way to implement interfaces like ILookup
I was hoping to achieve a user experience like
(get my-tensor my-coords)
defrecord allows you to implement interfaces, too, but not some fundamental ones like ILookup, IPersistentMap, etc., it already provides implementations for those intfs
Is there some general documentation on this somewhere? I will have the same questions for extending most of the polymorphic functions.
@U050ECB92 wow that's some powerful/simple repl-fu there thank you!!
now in cljs extending something like ILookup
is as simple as
(defrecord Tensor [data])
(extend-type Tensor
ILookup
(-lookup [o k]
"blah"))
but it would seem it's not that simple in Clojure?ClojureScript was implemented after Clojure, and relies heavily in its data structure implementations on protocols. Clojure/Java data structures were designed and written before Clojure protocols existed.
Ok no problem. I'm fine with the constraint that extending things like get
may not be fully possible in Clojure, just want to make sure I'm understanding what is possible/not
probably best thing to do might be to provide a namespace with similar functions to what clojure devs are used to using instead of trying to extend core functions, which could be confusing anyway. Thank you @U0NCTKEV8 @U050ECB92 @U0CMVHBL2!! ☮️ ❤️ ☀️
Sorry, I may be confusing the issue. It IS possible to implement those kinds of things within Clojure/Java, but there you do it by implementing the appropriate Java interfaces. It is possible in ClojureScript by extending the appropriate protocols. The mechanism one must use to implement these things are different in Clojure vs. ClojureScript. My message above was just a little bit of historical context as to why this difference exists.
@U0CMVHBL2 so I suppose this means in practice if I want a Tensor
datatype that will be polymorphically compatible with the get
function in Clojure, that would involve writing a Java class that implements the methods of, for instance, clojure.lang.ILookup
?
ah ok, I'm seeing a few things in open source projects now, starting to click
I can point you at one or maybe two open source projects that implement these interfaces in Clojure using deftype
The second one implements a custom map-like data structure, so probably closer to what interests you. The first implements a vector-like data structure, which needs a different set of Java interfaces to be implemented.
ahhhh very interesting.
;; works
(deftype Tensor [data]
clojure.lang.ILookup
(valAt [o k] "blah"))
;; crashes
(deftype Tensor [data])
(extend-type Tensor
clojure.lang.ILookup
(valAt [o k] "blah"))
extend-type is only able to work with protocols, because protocols are extensible at runtime. Interfaces cannot be "added" to a pre-existing type.
totally fine
You can also use reify as another approach. But probably deftype might make more sense in your case, it depends a bit in what you're use case is like
Great point. Getting some great mileage out of reify right and closures right now, we’ll see how it goes!
@U01KZDMJ411 Alex Miller posts a link to these in #news-and-articles when they come out weekly.