Fork me on GitHub
#clojure
<
2022-10-03
>
Drew Verlee04:10:16

How do you search "huge" (+100 keys) maps for terms? I think having some "intelligent" search fn available all the time would be pretty cool.

Drew Verlee04:10:44

intelligent is in quotes because something like word distance and regex would probably go a long way. obviously this kind of thing already exists already and will be stronger then that though.

p-himik05:10:04

Not sure what the actual question is. If a search function allows indexing, then create an index and use it for search. If it does not allow indexing, just linearly scan the whole map.

Drew Verlee05:10:42

Yeah, those are both options. I usually just scan the map. But it would be cool to have something like: (search m :hi) ;; didn't find ":hi" but found ":hello" did you mean ":hello"?

p-himik05:10:56

That is incredibly domain-specific. I don't think it's possible to generalize it in a meaningful way. Would it be cool though? Yes. But then, there's pretty much an infinite amount of things that would be cool. :)

Ben Sless05:10:06

You need a trie, not a map

Ben Sless05:10:32

A trie that also keeps topN at each node

Ben Sless05:10:51

By weight function preferably

Drew Verlee05:10:27

Ill have to google "trie", but i get topN and weight functions. Yeah, ok cool. Thanks for the feedback 🙂

Ben Sless05:10:58

Trie is like a tiered prefix map A string is a sequence of characters. Imagine looking up on char at a time by by prefix. H, then e, l, l, o finds a word. If it doesn't find anything, it gives the top N from the node which failed

p-himik05:10:59

And here's where problems with domain specificity begin. :) Tries are only suitable if you're searching by a prefix, right? But levenshtein(s, "hello") is 1 for "mello". And what if you need to find a string that has the lowest distance to the query? Or a shortest string under some distance threshold? Or what if you need strict matches on Perl super-regex with their look{ahead,behind}? And a thousand more of "what if"s, which will all sound equally probable given the vagueness of the OP and follow-ups.

👍 2
Ben Sless05:10:31

Tldr - it depends 🙂

craftybones06:10:21

It could be a library I suppose

craftybones06:10:29

But definitely not a core language feature

Drew Verlee06:10:18

Well "intelligence" is vague because it's hard to define. As an example of "not very intelligent" i usually just filter the keys using sting includes?. But what if it's upper case? well, maybe i make it case insensitive... and it becomes a bit smarter by being a bit dumber. I mean, i could search my hashmap with elastic search. 🔥 😜

p-himik06:10:06

It's not hard to define - it's impossible to define without knowing the exact domain. Searching for user queries on movie names can be as lax as possible - ignoring cases, diacritic marks, punctuation, multiple typos, missing words, etc. (And maybe that's exactly what you mean, but you haven't specified so. And even then, multiple loss functions are possible, with domain-dependent differences in how effective they are.) But e.g. searching for passport IDs must not allow for any mistakes.

Drew Verlee06:10:43

Yep. yep. i agree.

Martynas Maciulevičius09:10:02

You can use Lucene if you want Levenshtein distance and other fuzzy things: https://lucene.apache.org/core/2_9_4/queryparsersyntax.html But this is mostly for strings

m.q.warnock10:10:43

Since you specified 'huge' as +100 keys, and this would basically only be useful at the repl, building indices doesn't seem sensible. Here's how I'd do it, with 'levenshtein from clj-fuzzy

(defn search [m search-key]  (if-let [v (m search-key)]
    v
    (reduce-kv (fn [[closest-key best-dist] k v]
                 (let [dist (levenshtein k search-key)]
                   (if (< dist best-dist)
                     [k dist]
                     [closest-key best-dist])))
               [:err Long/MAX_VALUE]
               m)))

m.q.warnock10:10:44

or something like that- ugly to return 'v' in the exactly-found case

Drew Verlee16:10:33

Thanks @U043HTJ9RC2 i'll look into clj-fuzzy and some other solutions. ill report back when i can 💪 .

madis08:10:13

Hello fellow clojurians, may your parens come in pairs and have a great new week! > Context: I'm writting & running this code as Babashka scripts (https://github.com/babashka/babashka) I hope you can help me with this small confusion that I'm having: I'm trying to pretty print a clojure map into EDN file, using

(spit "/tmp/question.edn" (with-out-str (clojure.pprint/pprint {:deps {'is.mad/some-lib {:mvn/version "2022.10.03"}}})))
And it works, producing the following EDN file
{:deps #:is.mad{some-lib #:mvn{:version "2022.10.03"}}}
BUT when I do it from a test, the version string "2022.10.03" gets written to a file without surrounding double quotes (`"`), like the following:
{:deps #:is.mad{some-lib #:mvn{:version 2022.10.03}}}
> This causes an issue that afterwards the EDN can't be de-serialized (at least not that easily) I haven't been able to isolate the issue. It happens deterministically (i.e. always). When I take the code out (from the required CLJ file) and run it in isolation, it works as expected (outputting the version string with surrounding double quotes) Do you have ideas what could cause this behaviour? Is there some configuration I might have unknowingly set that can affect it? Any other approaches I could try to output formatted EDN to a file? Many thanks in advance! Madis

borkdude08:10:33

@U6P6QJTUZ The difference is probably:

$ bb -e '(binding [*print-readably* true] (with-out-str (clojure.pprint/pprint {:deps {(quote is.mad/some-lib) {:mvn/version "2022.10.03"}}})))'
"{:deps {is.mad/some-lib {:mvn/version \"2022.10.03\"}}}\n"
vs
$ bb -e '(binding [*print-readably* false] (with-out-str (clojure.pprint/pprint {:deps {(quote is.mad/some-lib) {:mvn/version "2022.10.03"}}})))'
"{:deps {is.mad/some-lib {:mvn/version 2022.10.03}}}\n"

borkdude08:10:56

So if you can ensure that *print-readably* is bound to true it should work

🙌 2
madis09:10:52

Hey thanks for the response Borkdude, I just followed you today on Github. What a history of contributions you have to Clojureland! :star-struck: I tried your recommendation and indeed - binding *print-readably* to true produces the double quotes as I was expecting. Problem solved.

👍 2
Martynas Maciulevičius12:10:01

Hey. Is there a way to make clojure.pprint/pprint care about *print-namespace-maps*? prn seems to print it correctly but I also want a formatted EDN file. I want to produce an EDN file that I'll share with other people and I would like to minimize the syntax where maps are prefixed. They'll need to edit it and it's best to keep it basic. This works for pretty-printing but it doesn't care about *print-namespace-maps*:

(println (binding [*print-namespace-maps* true]
           (with-out-str (pp/pprint {:my/item :hi
                                     :my/item2 {:nested1 :item
                                                :nested2 :item
                                                :nested3 :item
                                                :nested4 :item
                                                :nested5 :item
                                                :nested6 :item}}))))
This cares about *print-namespace-maps* but it's a blob:
(println (binding [*print-namespace-maps* true]
           (with-out-str (prn {:my/item :hi
                               :my/item2 {:nested1 :item
                                          :nested2 :item
                                          :nested3 :item
                                          :nested4 :item
                                          :nested5 :item
                                          :nested6 :item}}))))

p-himik12:10:50

It seems to me that it works as intended:

Clojure 1.11.1
user=> (require '[clojure.pprint :as pp])
nil
user=> (pp/pprint {:a/x 1 :a/y 2})
#:a{:x 1, :y 2}
nil
user=> (binding [*print-namespace-maps* false] (pp/pprint {:a/x 1 :a/y 2}))
{:a/x 1, :a/y 2}
nil

p-himik12:10:56

What results do you see?

borkdude12:10:07

Is this an issue with bb perhaps? I just checked and there appears to be an edge case here:

user=> (binding [*print-namespace-maps* false] (pp/pprint {:a/x 1 :a/y 2}))
#:a{:x 1, :y 2}

borkdude12:10:18

I'll make sure it'll be fixed in the next release

Martynas Maciulevičius12:10:49

I want to print the namespaces in the maps. This is what I don't want:

#:a{:x 1, :y 2}
This is what I want:
{:a/x 1, :a/y 2}
This is on JVM Clojure :thinking_face:

Martynas Maciulevičius12:10:20

I found my problem. It's that I didn't understand the doc of *print-namespace-maps* and set it into the opposite value. What a stupid one 😄

p-himik12:10:44

Happens. And, well, you helped borkdude find a bug. :D

Martynas Maciulevičius12:10:15

Wow 😄 Well... It's also possible that he fell for the same variable name as I did 😄

weavejester13:10:48

Does anyone happen to have an example of a complex edn or aero configuration? I'm trying to get an idea of how people write real world configurations, and what challenges there are.

lispyclouds14:10:33

Not sure what qualifies as complex 😅 but here is where i use Aero+Integrant in a mono repo setting not only to share code but to share configs as well: Main repo: https://github.com/bob-cd/bob component 1: https://github.com/bob-cd/bob/tree/main/apiserver 2: https://github.com/bob-cd/bob/tree/main/entities 3; https://github.com/bob-cd/bob/tree/main/runner common/shared: https://github.com/bob-cd/bob/tree/main/common the configs are in the resources and the initialisation is in src/component_name/system.clj hope this is something useful? 😄

weavejester14:10:25

Thanks! It is

dpsutton14:10:31

http://grep.app lets you search github for regexes. You can find some in the wild examples with https://grep.app/search?q=%5Baero.core%20%3Aas

👍 1
weavejester14:10:04

That's also a good suggestion!

eskos14:10:14

FWIW, when I moved to aero from DIY solutions my configuration complexity went down by a lot; #profile especially helped with that.

lispyclouds14:10:55

Aero's custom tag literals are pure awesome. reduced a lot of my complexity too!

weavejester14:10:55

@U7ERLH6JX What do you use them for?

lispyclouds14:10:40

Mostly use the #env, conditional ones like #or for defaults and coercion ones like #long. Aero does all the heavy lifting of making sure the config is right and coerced as expected with just declarative code.

weavejester14:10:17

Ah, you mean Aero's readers, rather than your own you defined for Aero.

lispyclouds14:10:23

declarative data in fact. so its easy to merge too

lispyclouds14:10:07

yeah the ones that come packaged. just called them that because they seem to call it like that? https://github.com/juxt/aero#tag-literals

lispyclouds14:10:49

havent gotten to a place to use my own, default ones seem quite good for my needs

p-himik14:10:04

I usually define just one custom tag - #dotenv, almost the same as #env but slightly different format and reads its value from the .env file first.

😍 1
mpenet15:10:44

we use aero a lot, we generate kubernetes manifests from edn/aero files, application configs, api descriptions & more

mpenet15:10:45

there are very little drawbacks vs raw edn really

lread16:10:09

not terribly complex at all, but cljdoc uses aero, https://github.com/cljdoc/cljdoc/blob/master/resources/config.edn.

didibus04:10:05

The best part of Aero for me, is that it disagrees with the 12 factor app. Instead of favouring environment configs. It favours defining the config for each environment in edn alongside your code stored in git, using only the environment variable to choose which profile to use when reading from your aero config. And for secrets, it correctly disagrees with 12 factor app as well, by advising against environment variables again, and for using secret files instead.

eskos08:10:18

@U0K064KQV That sounds like a misunderstanding of the 12factor environment thing though, it even starts with > An app’s config is everything that is likely to vary between https://12factor.net/codebase That is, you’re not supposed to create a weird serialization format on top of environment variables with a bazillion keys and values, but more like pointers on where the important branching happens; usually this is stuff like where the config file is located, what the current environment is (which is convenient to pipe to #profile selector with sensible default indicating localhost/dev) and maybe some other similarly high-level config variants. Secrets handling has come further from the time 12factor was created though, eg. Kubernetes secrets are read-only mounted files in specific directory within a container. This gap in progress makes sense since 12factor is from era before orchestration, docker, or even services such as RDS.

eskos08:10:51

So it’s not really that it “correctly disagrees”, 12factor is an artifact of its time and it’s curious how well it’s kept its premise even with all the progress the industry has done in the mean time 🙂

1
didibus16:10:19

I was there in 2012 and disagreed then as well 😝, so I'm not sure it's a product of its time

didibus16:10:12

But, I don't really want to debate the real 12 factor app, in a real Scottman way. I might be wrong and you right, maybe they meant something different, but I've seen many people take that advice to put their configs in environment variable for each key/value config they have that differ per deploy environment. And I don't think that's a good practice. I'm glad k8s are pushing people to rely on files instead, but that was always possible even without k8s.

Lone Ranger18:10:38

How would one polymorphically extend the get in function Clojure at compile time?

Lone Ranger18:10:31

(defn get
  "Returns the value mapped to key, not-found or nil if key not present
  in associative collection, set, string, array, or ILookup instance."
  {:inline (fn  [m k & nf] `(. clojure.lang.RT (get ~m ~k ~@nf)))
   :inline-arities #{2 3}
   :added "1.0"}
  ([map key]
   (. clojure.lang.RT (get map key)))
  ([map key not-found]
   (. clojure.lang.RT (get map key not-found))))
get delegates to clojure.lang.RT/get but it's not obvious to me how to extend this behavior. My temptation would be:
(extend-type Tensor
  clojure.lang.RT
  (get [map key]
    "blah"))
but I get class clojure.lang.RT is not a protocol

hiredman18:10:46

it depends what you mean by all those things, but generally the way to support the get function is to implement the clojure.lang.ILookup interface

Lone Ranger18:10:53

Glanced at: https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/ILookup.java Tried

(extend-type Tensor
  clojure.lang.ILookup
  (valAt [obj key]
    "blah"))
and got interface clojure.lang.ILookup is not a protocol am I getting warmer?

ghadi18:10:29

ILookup is a java interface, and you can't extend java interfaces to classes (Tensor) not under your control

ghadi18:10:43

(assuming Tensor is from a 3rd party library)

Lone Ranger18:10:12

it's just something I made up.

ghadi18:10:28

ah, well you're in luck. deftype & reify offer you a way to implement interfaces like ILookup

partyparrot 1
Lone Ranger18:10:41

I was hoping to achieve a user experience like (get my-tensor my-coords)

ghadi18:10:44

defrecord allows you to implement interfaces, too, but not some fundamental ones like ILookup, IPersistentMap, etc., it already provides implementations for those intfs

Lone Ranger18:10:13

Is there some general documentation on this somewhere? I will have the same questions for extending most of the polymorphic functions.

Lone Ranger18:10:48

@U050ECB92 wow that's some powerful/simple repl-fu there thank you!!

Lone Ranger18:10:47

now in cljs extending something like ILookup is as simple as

(defrecord Tensor [data])

(extend-type Tensor
  ILookup
  (-lookup [o k]
    "blah"))
but it would seem it's not that simple in Clojure?

andy.fingerhut18:10:36

ClojureScript was implemented after Clojure, and relies heavily in its data structure implementations on protocols. Clojure/Java data structures were designed and written before Clojure protocols existed.

Lone Ranger18:10:57

Ok no problem. I'm fine with the constraint that extending things like get may not be fully possible in Clojure, just want to make sure I'm understanding what is possible/not

Lone Ranger18:10:07

probably best thing to do might be to provide a namespace with similar functions to what clojure devs are used to using instead of trying to extend core functions, which could be confusing anyway. Thank you @U0NCTKEV8 @U050ECB92 @U0CMVHBL2!! ☮️ ❤️ ☀️

andy.fingerhut18:10:01

Sorry, I may be confusing the issue. It IS possible to implement those kinds of things within Clojure/Java, but there you do it by implementing the appropriate Java interfaces. It is possible in ClojureScript by extending the appropriate protocols. The mechanism one must use to implement these things are different in Clojure vs. ClojureScript. My message above was just a little bit of historical context as to why this difference exists.

Lone Ranger18:10:51

@U0CMVHBL2 so I suppose this means in practice if I want a Tensor datatype that will be polymorphically compatible with the get function in Clojure, that would involve writing a Java class that implements the methods of, for instance, clojure.lang.ILookup?

Lone Ranger18:10:40

ah ok, I'm seeing a few things in open source projects now, starting to click

andy.fingerhut18:10:30

I can point you at one or maybe two open source projects that implement these interfaces in Clojure using deftype

andy.fingerhut18:10:58

The second one implements a custom map-like data structure, so probably closer to what interests you. The first implements a vector-like data structure, which needs a different set of Java interfaces to be implemented.

Lone Ranger18:10:21

ahhhh very interesting.

;; works
(deftype Tensor [data]
  clojure.lang.ILookup
  (valAt [o k] "blah"))
;; crashes
(deftype Tensor [data])

(extend-type Tensor
  clojure.lang.ILookup
  (valAt [o k] "blah"))

Joshua Suskalo19:10:13

extend-type is only able to work with protocols, because protocols are extensible at runtime. Interfaces cannot be "added" to a pre-existing type.

💡 1
didibus04:10:42

You can also use reify as another approach. But probably deftype might make more sense in your case, it depends a bit in what you're use case is like

Lone Ranger12:10:58

Great point. Getting some great mileage out of reify right and closures right now, we’ll see how it goes!

pyr18:10:10

@alexmiller thanks a ton for the regular Clojure Derefs, it helps a lot!

gratitude 5
❤️ 2
skylize20:10:46

@U01KZDMJ411 Alex Miller posts a link to these in #news-and-articles when they come out weekly.

👍 1
jrychter10:10:34

Oh, these are great! Subscribing to the RSS now. Thank you!