This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-07-09
Channels
- # announcements (17)
- # babashka (8)
- # beginners (68)
- # calva (28)
- # clj-kondo (36)
- # cljsrn (1)
- # clojure (232)
- # clojure-dev (3)
- # clojure-europe (13)
- # clojure-nl (14)
- # clojure-spec (9)
- # clojure-uk (11)
- # clojuredesign-podcast (3)
- # clojurescript (38)
- # core-async (3)
- # cursive (1)
- # datahike (4)
- # datomic (4)
- # fulcro (56)
- # graphql (1)
- # helix (3)
- # honeysql (5)
- # introduce-yourself (1)
- # kaocha (2)
- # lsp (67)
- # malli (7)
- # meander (2)
- # off-topic (1)
- # pathom (9)
- # re-frame (55)
- # reitit (3)
- # releases (8)
- # remote-jobs (12)
- # shadow-cljs (12)
- # sql (3)
- # tools-deps (55)
- # vim (5)
- # xtdb (3)
For context, I'm trying to employ it to do some caching inside a Docker image, and it seems to stop the build.
It does. When the command line arguments are incorrect or when there was an exception building the classpath.
I have a question about a statement in the clojure documentation. Can someone help me understand the sentence: I am referring to https://clojure.org/reference/metadata . The 2nd sentence in the 3rd paragraph says: One consequence of this is that applying metadata to a lazy sequence will realize the head of the sequence so that both objects can share the same sequence. I don't understand both here. What are the two objects? Is it saying that the meta data and the sequence share the same meta data?
or is it saying the sequence and its head share the same meta data?
The first sentence in that paragraph:
> That said, metadata and its relationship to an object is immutable - an object with different metadata is a different object
The two objects are:
• A lazy seq that you pass to with-meta
• The result of with-meta
ahhh, that makes sense. I think this is a case of ambiguous antecedent.
That said, metadata and its relationship to an object, A, is immutable - an object, B, with different metadata is a different object. One consequence of this is that applying metadata to a lazy sequence, A, will realize the head of the sequence so that objects A and B can share the same meta-data.
perhaps it also should be "can share the same meta data" NOT "can share the same sequence" ?
what two things share the same sequence? Sorry that I am confused.
(def s some-lazy-seq)
;; `s` is an instance of `LazySeq` that has a private member `s` of type `ISeq`.
(def s+m (with-meta some-lazy-seq {:hello :there}))
;; `s+m` is an instance of `LazySeq` that has the very same member that points to the very same data. But `s+m` has a different metadata.
Here's the implementation of LazySeq/withMeta
:
public Obj withMeta(IPersistentMap meta){
if(meta() == meta)
return this;
return new LazySeq(meta, seq());
}
And of LazySeq/seq
:
final synchronized public ISeq seq(){
sval();
if(sv != null)
{
Object ls = sv;
sv = null;
while(ls instanceof LazySeq)
{
ls = ((LazySeq)ls).sval();
}
s = RT.seq(ls);
}
return s;
}
Sorry, that bit of java is a bit beyond my skill-level. I'm not a java programmer. In my opinion, I would like to be able to understand the the documentation (at least the simple documentation) without having to understand the java implementation. I think I'll just figure out how to file a bug/issue report about a confusing grammar error in the doc which leads to an ambiguity.
A lazy seq by default doesn't realize anything - it's a wrapper around a function that does that. Upon iterating over it, it stops wrapping the function and starts wrapping a seq that results from that function. Adding a metadata to a lazy seq has to create another lazy seq that points to the same data. But you can't share that function that exists in unrealized lazy seqs because the function wraps a mutable Java iterator - calling the function again will yield different results. That's why Clojure has to call the function once, but use its result twice - first in the original lazy seq, and then in the lazy seq with new metadata.
I"ve added this issue: https://ask.clojure.org/index.php/10756/which-two-objects-share-the-same-sequence
feel free to comment
"what two things share the same sequence?" the answer is the original lazy-seq and the new lazy-seq returned by with-meta
They share the same underlying sequence, but have different meta-data. @U2FRKM4TW’s answer above explains this well. I see the point about confusing antecedent, however it doesn't make sense for the head of a sequence to have the same underlying sequence as the sequence of which the head is a part. Hence "both" must be the original lazy-seq and the one returned by with-meta
I could really be off on this one... but when I read "immutable"... I immediately think of how Clojure uses a "trie" to copy the data to a new instance... but will share a head in the trie.
since sequence is a higher level abstraction over collections... like a vector... and vectors use tries (trees)... which share nodes (heads) when a copy is made in memory (persistant)... I think this is what the docs are getting at?
I am reading articles all over the web explaining sequences as "abtractions for collections"
TBH I care very little about high-level API digests. :) I just look at the implementation most of the time. The sentence in OP talks specifically about lazy sequences. A lazy sequence is a specific thing with a specific implementation. That statement draws its content from that implementation. Vectors and abstract "collections" have nothing to do with this - they only complicate things.
Because a lazy sequence is a concrete thing?.. When I say "a signed 32 bit integer 2147483647 becomes -2147483648 when you add 1", you can't say "that's not how numbers work". A "signed 32 bit integer" is a very specific implementation, with its specific traits. Just how a lazy seq is a very specific implementation of the collections abstraction, with its own specific traits - one of which is that peculiarity about metadata.
ahhhh... so sequence is actually the implementation of the abstract concept of a collection... I was thinking just the opposite... makes more sense.
Yes. A lazy sequence is a concrete sequence. A sequence is a more concrete collection.
People often confuse what an "abstraction" is. In programming, an abstraction is any partially defined construct.
A function is an abstraction for example, because it partially defines a computation, where the function parameters are missing their value. You can't run it yet until you call it and pass it some arguments. You would say the function is an abstraction, but once provided argument it is concrete and can now be executed.
Basically, everything that is a template of some sort is an abstraction.
Another such abstraction are Interfaces. An interface is an abstraction, because it defines a set of function definitions which are missing their implementation body.
In Clojure, sequence is such an abstraction, it is defined by the ISeq interface. Thus the Clojure sequence is an abstraction defined by the interface ISeq.
It defines four function definitions: first,
next,
more
and cons
, but their implementation body is undefined, so it's a partial definition for a type which support those four operations.
In order to make it concrete (that is fully defined, no longer partial), you need to provide an implementation for their bodies. This is done by the seq
function.
Each collection have their own implementation of ISeq and seq
will return the appropriate one for the given collection.
But this doesn't need to be an inheritance, so collections aren't necessarily sequences, there is not an inherent IS-A relationship. You could use composition as well, so seq
could return a type which wraps a collection and implements first, next, more and cons in way that uses the collection, but the collection itself wouldn't implement those functions.
This is kind of what LazySeq does. A LazySeq is a sequence, it implements first, next, more and cons. LazySeq is not an abstraction, it is a concrete type, not an Interface. (Side note, a class is an abstraction as well, since it's a template for creating concrete instances of Objects, so LazySeq is a class abstraction for concrete Objects, but it's not an Interface abstraction which is what we mean here).
What LazySeq does is that it wraps another sequence in a way that makes it lazy. So LazySeq is a concrete sequence implementation that takes any abstract sequence and can make them lazy.
So sequence is an abstraction. LazySeq is a concrete implementation of that abstraction. And each Clojure collections have a corresponding implementation for the sequence abstraction which is returned by seq
As for the original question. The paragraph is saying that two objects of equal value but different metadata are considered to be equal, but changing the metadata of an object will result in an equal copy of that object. Which is just to say that metadata on an object is immutable, so changing the metadata of an object actually creates a copy of it with different metadata. One consequence of this is that for LazySeq, where sequence are supposed to cache the realized value, it is saying that the original LazySeq object and its copy with altered metadata are meant to have the same value. To do so, they need to share the same sequence, otherwise they'd be double realizing things. Which means that before creating the copy with altered metadata, the LazySeq has to create the sequence by realizing the head so it can pass it to the copy.
It's definitely a caveat of the implementation of LazySeq, but also, if Clojure allowed to mutate metadata this would not be an issue, so it's also a consequence of metadata being immutable.
An example to demonstrate:
(def ls (lazy-seq (println 1)))
(def ls2 (with-meta ls {:foo :bar}))
;;=> 1
(first ls)
;> nil
(first ls2)
;> nil
If with-meta didn't realize before creating ls2, then when you would call first on ls and ls2, both of them would print 1. But conceptually with-meta should return the same sequence, one where things don't get realized twice, because that's not what sequences are supposed to do. Unfortunately because of the way lazy-seq works, you can't guarantee that if you copy the LazySeq without having realized the head. So it needs to first realize the head as soon as with-meta is called so that both ls and ls2 behave the same afterwards.An even better example:
(def ls (lazy-seq (println "realized head") [(rand-int 10)]))
(def ls2 (with-meta ls {:foo :bar}))
;;=> realized head
(first ls)
;> 3
(first ls2)
;> 3
If with-meta didn't realize the head, then when you called first on ls and ls2 you could get a different value back, but this would be wrong, because ls2 should be equal to ls and has the same value, only differ by the meta.
And in order to achieve this, lazy-seq has to realize the head prior to returning an equal copy with different meta.I have yet another question about https://clojure.org/reference/metadata . Paragraph 3 says: metadata and its relationship to an object is immutable
Yet there is a function alter-meta!
which is documented to: Modify or reset the metadata respectively for a namespace/var/ref/agent/atom.
So my question is: I the metadata mutable or immutable?
@jimka.issy alter-meta!
is for stateful containers, like swap!
is.
so should I understand that metadata on a sequence is immutable but metadata on stateful containers is mutable?
The metadata itself is an immutable hashmap, but irefs (the name for a stateful container apparently) allow that immutable value to atomically change to another.
I think/hope I understand. What I have is a program implemented in Python, Scala, and Clojure. The clojure program seems to be much slower than the other two. I think this is due to memoization. In Scala and Python I have objects (of application specific classes) which memoize (per object) the return values of certain functions.
The clojure version doesn't use application defined classes, but rather just s-expressions (sequences, sometimes lazy). I'd like to similarly memoize, and I was hoping to put meta data on the sequences.
I don't know what data to memoize at object creation time, but only when certain multi-methods are called.
I considered, but rejected the idea of simply using the memoize
function because it memoizes forever. I'd like the information be to GC'd when the object dies.
so while Clojure can be faster... it can use a lot of memory... so I would monitor your memory (space complexity)
comparing Python to Clojure... you must consider at least a few factors.... yes, speed... but also memory space complexity, work load (concurrent or not)
Python has a GIL... it cannot offer the concurrency like Clojure can... and Clojure by default is for vertical scaling (large memory) and multiple CPUs
but you can also speed Python up with Cython or Jython in many cases (compatible foreign function interface)... I hope that helps.
Sorry, my answer was confusing as it's a subtlety. If you're trying to do memoization against object keys which will be GC'd at some point note that: • Hashmaps won't do the job, you'll need to use one of the weak kinds of reference things • You probably want a java-y thing which supports the weak stuff such as https://github.com/ben-manes/caffeine • https://www.baeldung.com/java-weakhashmap is OK if your values won't be held onto after the key object is GC'd. tl;dr, use caffeine 🙂
I've had bad experiences using these weak-hash-maps in the past. once bitten twice shy.
@dominicm is there some reason you are warning me against using meta data on s-expressions to mimmic mutable fields in the corresponding class instances from Python or Scala?
it has been a while since I used them, but I seem to recall that they tripled the memory usage of my previous application (at least a Scala wrapper around the java weak-values-hash-map) I suspect a reason for 3xing the memory size is to avoid what would otherwise be an additional quadratic complexity to the garbage collector.
There's also https://github.com/clojure/core.memoize which provides mechanisms for cleaning out the cached calls, if that's helpful
I mentioned memoize
in my original post, including the justification why I can't use it.
for the most part, the problem with memoize
is that it memoizes until the end of time. If you use this function in a recursive function of large complexity it can demand huge resources, and there's not API for the programmer to free the memory.
that's not the the clojure.core/memoize
fn, it's a library that implements memoization whilst solving the "memoizes forever" problem
It would be great if there were optional arguments to memoize
which told the implementation to use a weak hash table, which would would allow things to be un-memoized if memory fills up.
ahhhh!!!!
nice. I didn't know about that.
I've used it in the past to get past the forever part of memoize ... no idea if it'll be helpful in your situation, but I'm kinda with dominicm in that I'm not sure I get how your metadata strategy thing will work
one way my meta data idea would work is as follows.
Imagine that you have a Scala/Python class with a method named foo, for simplicity assume that foo has no arguments. If foo is expensive to compute, then I sometimes make a private field called foo whose default value is None. when foo is called, it first checks foo, and returns its value if not none. otherwise computes the values and sets _foo
Therefore when the object is GC'ed, then the value in object._foo goes away with it.
I don't see a comment about the stored-forever problem in the readme of https://github.com/clojure/core.memoize
Did I overlook it?
There are a few strategies for clearing out the cache, for example: https://github.com/clojure/core.memoize/blob/ff20137720a36e0e1ded75ebcdd53d4d76fbe6eb/src/main/clojure/clojure/core/memoize.clj#L354 ... First in, first out
I'm not sure I understand the relationship between the python object and the lazy-seq/metadata. Are you saying that you want to cache the generation of each element in a seq by attaching metadata to it? If so, the seq will do that for you, once a seq has got it's first element, all calls to first
on that seq will be a cache lookup and it won't recalculate the first element ...
no that's not what I mean.
imagine in python I create an object as follows: x = CreateMyObject(...)
, and later call y = x.canonicalize()
, and then call z = y.canonicalize()
. It's a silly example, but pedagogical . my code for canonicalize takes self
and allocates a new object, and also sets self.saveCanonicalize
to the new object, and also sets newObject.saveCanonicalize = newObject
the result is that even though I've never called canonicalize
on y
, when I do call it, it is already memoized. This is possible in Scala and Python because I have an object where I can put fields, and I can make sure the eq function ignores that field.. But in clojure I'm using a sequence, perhaps lazy, perhaps not, to store the information. I was thinking about putting metadata about the sequence using the clojure metadata facility. -- to me that seems like the logical place to put it.
http://with.regard.to least-recently-used, is there a way to tell it to use a weak hash table, and clear automatically the way the weak hash would? i.e., don't bother unless getting short of memory, then remove unreferenced items. I didn't see that option.
core.memoize
is built on core.cache
which has a soft reference cache, so you could probably try something like
(require '[clojure.core.memoize :as m]
'[clojure.core.cache :as c])
(let [f (m/memoizer identity (c/soft-cache-factory {}))]
(f {:test 123}))
which should allow the cached memoized values to be gc'dthat certainly looks easy.
I'm still a bit confused by your python example. Does y.canonicalize()
return x.canonicalize()
?
yes it returns the same (eq) value which has already been memoized
since canonised is a idempotent function
no, x and y are sequences
expressions such as (and (or (and a b) (not b)) (or c (not (and a b))))
you can imagine what canonicalize does to such a sequence
so in python its not just a sequence but an object of type SAnd or SOr or SNot etc.
yes, it applies a long list of recursive searches to see if it can reduce to a canonical form. the leaf elements are types, so it also checks for disjointness and subtype-ness
well, I guess I'd say I want to experiment and see whether a bit of memoization makes the clojure version roughly the same speed as the python and scala versions
the unit tests in python take about 30 seconds to run and the unit tests in clojure haven't finished since I started them this morning
I guess I'm confused by the metadata thing, because to add metadata to a seq, you're going to have to create a new object
(I once worked on a project where the tests took over 8 hours to run ... we just ran the tests every night and emailed everyone the results ... it wasn't the best 😉 )
about creating a new object, yes, I see that as well. that is indeed an issue which I'd have to figure out, such as always going through a factory function.
thus my question about whether metadata is immutable. ideally I'd like to attach meta data to an existing sequence. I guess there's no api for that.
my factory function could always create a metadata as an atom when it allocates a new sequence
I think that the python structure is using mutable objects to combine the data with the canonicalize function
if, as you suggest, the core.memoizer cooperates with the GC, that's probably even easier
yes my python and scala code are written in a very functional style, but does uses mutation for this memoization process. just as clojure, under the hood, does the same thing.
BTW, before this project I didn't know much about python. I still don't know much about the destructive functions, but I do quite like the python object model and writing functional style python via map, flat_map, list-comprehensions etc.
One thing I really like about python as well as common-lisp which I miss in both clojure and scala is the ability to immediately return from a named block.
👍 ... just with clojure's data structures you don't really have the option to add that type of cacheing 😉 ...
def conversion8(self):
from genus.s_not import notp
# (or A (not B)) --> STop if B is subtype of A, zero = STop
# (and A (not B)) --> SEmpty if B is supertype of A, zero = SEmpty
for a in self.tds:
for n in self.tds:
if notp(n) and self.annihilator(a, n.s):
return self.zero()
return self
vs
(defn conversion-C8
"(or A (not B)) --> STop if B is subtype of A, zero = STop
(and A (not B)) --> SEmpty if B is supertype of A, zero = SEmpty"
[self]
(if (exists [a (operands self)]
(exists [n (operands self)]
(and (gns/not? n)
(= true (annihilator self a (operand n))))))
(zero self)
self))
I've written my own exists
macroI've been using it for 2 weeks, so I'm not an expert either
my python code looks like lisp code.
oops, I'm glad I posted that, because the python code has a bug. I need self.annihilator is True
like in clojure
yeah ... I worked on a 4 month project in python a year or two ago, and i've written the odd ansible script ... my python isn't very pythonic either 😉
what's rubber duck?
https://en.wikipedia.org/wiki/Rubber_duck_debugging ... I was just making a bad joke ... feel free to ignore me 😉
about returning early. I don't use it often, but I find that using it conservatively obviates lots much boiler plating.
think of it as a kinder-and-gentler exception
Clojure's reduce
has the very useful reduced
which is sort of the same thing.
I just wish it had named-reduced
because reduced
can get confused in reentrant code.
OK, back to work. I'll try out your memoized suggestion. Thanks for that. strange that that important feature isn't mentioned in the readme.
OH by the way. when using
(require '[clojure.core.memoize :as m]
'[clojure.core.cache :as c])
(let [f (m/memoizer identity (c/soft-cache-factory {}))]
(f {:test 123}))
how can it put something into the cache without calling f
?otherwise, you can keep a reference to the cache, which implementis this protocol
(defprotocol CacheProtocol
"This is the protocol describing the basic cache capability."
(lookup [cache e]
[cache e not-found]
"Retrieve the value associated with `e` if it exists, else `nil` in
the 2-arg case. Retrieve the value associated with `e` if it exists,
else `not-found` in the 3-arg case.")
(has? [cache e]
"Checks if the cache contains a value associated with `e`")
(hit [cache e]
"Is meant to be called if the cache is determined to contain a value
associated with `e`")
(miss [cache e ret]
"Is meant to be called if the cache is determined to **not** contain a
value associated with `e`")
(evict [cache e]
"Removes an entry from the cache")
(seed [cache base]
"Is used to signal that the cache should be created with a seed.
The contract is that said cache should return an instance of its
own type."))
why? because I want to always put the return value of canonicalize into the cache, associating the value with itself. So that (canonicalize (canonicalize something))
only does work at most once.
for the special case of canonicalize
, yes. of course I have other memorable functions which are not idempotent.
does this mean I need to call hit
from within canonicalize
?
or maybe miss
that's the only one with the necessary 3 arguments, cache, input-value, output-value.
off the top of my head, I don't remember ... probably ... there's a good blog post about core.cache (by dpsutton I think) cos it's a bit tricksy
https://dev.to/dpsutton/exploring-the-core-cache-api-57al ... that I think??
I don't know anything about protocols. How can I, the application programmer, call one of the protocol functions? Aren't those functions only used internally, and are not part of the API?
anyway, enforcing idempotency is a problem I can attack later. its not 100% necessary for the first pass solution.
np ... get some real work done 😉 ... protocols just create functions in the namespace they're defined in, so you can call c/miss
or whatever ... but they're usually considered an internal thing in a lib, so you'll often found them wrapped by a normal clojure fn
With gc-friendly-memoization, my tests finish in a few seconds rather than hours
hi l0st3d, I'm having what looks like a problem in core.memoize. I'm investigating. It looks really similar to a problem I had when using the java weak hash tables from Scala a year or so ago. The problem is that when the underlying java function removes the value from the hash table, it fails to immediately remove the key. I'm not sure if this is a bug or a feature. but the consequence is that accessing the key returns the java null as a value.
does this sound in any way familiar? perhaps this is a bug in my program and only coincidentally looking familiar to me.
the consequence for clojure is that when I a function is memoized and you call it with some arguments, it SHOULD thereafter return the same thing with given the same arguments. However, if the corresponding value gets gc-ed then calling with the same value will result in they hash value being java-null and the clojure function consequently return nil rather than re-computing the function in question the hard way
that doesn't sound immediately familiar, but I've generally not relied on gc to control eviction from these sorts of caches ... I can see how that might happen with a weak reference
it does look like the memoize lib puts a deref
able thing in the cache though, so if it should know the difference between your function returning nil
and the gc process having cleared out the soft reference ...
@jimka.issy can you write a minimal test case of the soft-cache that proves it's a problem?
(require '[clojure.core.memoize :as m]
'[clojure.core.cache :as c])
(import '(java.lang.ref ReferenceQueue SoftReference)
'(java.util.concurrent ConcurrentHashMap)
'(clojure.core.cache SoftCache))
(let [c (SoftCache. (doto (ConcurrentHashMap.)
(.put (list 1) (SoftReference. nil))) (ConcurrentHashMap.) (ReferenceQueue.))
f (m/memoizer #(do (prn '>> %) %) c)]
[(f 1) (f 1)]
(mapv #(vector (key %) '-> (.get (val %))) (seq c)))
I think that ^ forces the cache into the state where it contains the equivalent of a gc'd reference and it seems to run the function only once and return the correct val ... right?
hmmm ... no ignore me... I think I've misread the code ... but I need to go now ... can try and have another look in a bit
I debugged it. I think it was a problem in my own code. it just looked curiously like a problem I had in Scala some time ago.
was the problem that you were using scala? 😜 .... but seriously, glad you got it sorted ... I think it looks like the soft cache does the right thing
@jimka.issy I don't fully understand your suggestion with metadata and sexprs for solving this problem tbh 🙂. The tooling around cleaning stuff up on GC is pretty complicated, so it's better to lean into something on the JVM than try to roll your own as a v1 imo. But your use-case might have easier reference tracking so you can easily know when to clear the cache keys out.
one trick I do in the Python and Scala versions is the following. I have a function called canonicalize
which I call on an arbitrarily large expression tree. The function applies a large suite of simplification rules, and returns a canonicalized expressions tree. So in this case I need to memoize TWO things, not ONE. I need to mark the original tree has having a certain canonicalization. but I need to mark the new tree has having itself as canonicalization. I.e., I want to avoid someone trying to canonicalize the new object, having that compute a long time and return an isomorphic object.
I don't know how to do this using the memoize
function, I don't know yet whether the core.memoize replacement allows the programmer to manipulate the cache through the API.
my idea was just to put the meta on the object which they concern. then when the objects are cleaned up, the meta data will be cleaned up as well.
Are there any examples of clojurescript working with aero and clip
Storing the system config in an edn file that gets loaded and passed to clip. Clojurescript targeting node
@mail024 There're some examples at the bottom of the clip readme of aero/clip, none targetting node afaik but maybe you can adjust them easily enough?
@mail024 Just under that 🙂 https://github.com/juxt/clip#example-application
Have hit a case where clip doesn’t seem to resolve if the form passed in uses a symbol to refer to a function. Aero then doesnt resolve this to the actual function, so clip then doesn’t seamlessly start a system if the config comes from aero in cljs. Will investigate later and pick this up in an issue on clip.
@mail024 Thanks, I'll look out for your issue later. symbols definitely resolve in all the tests, so it must be something specific to your context.
you can use clojure -P
to download all the deps it needs. This would fetch all jars that are specified
@U11BV7MTK I think you've missed that @U0AHTPQBG is after the "sources" dependency. As you can download a pom or javadoc for a dependency. Afaik @U0AHTPQBG there's no solution to this atm.
Every artifact published also has a source jar also published which has the source code bundled. Which helps with debugging
deps.edn uses $ to mark it out, e.g. foo.bar/baz$pom
would be for the "pom" <thing> of foo.bar/baz
Maven lets you say "hey, get me the sources for my deps", so you can get the java code and jump to source
Feel free to keep track of https://github.com/clojure-emacs/enrich-classpath/issues/2 (I plan to implement it this weekend) it will be a solution used by cider and anything else that wants to
I'm trying to refine the way I write clojure code to be closer to the way the community and experienced clojure devs would write there code. Can someone take a look at this function I wrote and refactor into a form more typical of clojure code or suggest some improvements:
(defn get-numbers-from-words
"Creates a vector of number values from number strings in the argument
string."
[s]
(let [;; Creating a map of numeric words to numbers.
number-map
{"one" 1
"two" 2
"three" 3
"four" 4
"five" 5
"six" 6
"seven" 7
"eight" 8
"nine" 9}
;; Lowercasing the entire string.
lowercase-str
(.toLowerCase s)
;; Splitting the string into separate words.
words
(.split lowercase-str " ")
;; Replacing all of the numeric words with numbers, and filtering out
;; all of the unique numbers into a vector.
number-values
(->> words
(map #(get number-map %))
(filter (comp not nil?))
(set)
(vec))]
number-values))
(def numbers-by-word
{"one" 1
"two" 2
"three" 3
"four" 4
"five" 5
"six" 6
"seven" 7
"eight" 8
"nine" 9})
(require '[clojure.string :as str])
(sequence
(comp
(map str/lower-case)
(map numbers-by-word)
(remove nil?)
(distinct))
(str/split "THree point one four one five nine" #"\s"))
would this be a more clojure-like way of writing this code, or is the above code better:
(def number-map
"A map of word numbers to numeric numbers."
{"one" 1
"two" 2
"three" 3
"four" 4
"five" 5
"six" 6
"seven" 7
"eight" 8
"nine" 9})
(defn get-numbers-from-words
"Creates a vector of number values from number strings in the argument
string."
[s]
(as-> s $
(.toLowerCase $)
(.split $ " ")
(map #(get number-map %) $)
(filter some? $)
(set $)
(vec $)
(sort $)))
I like transducers, but I don’t always use them. For simple things like this, I still use threading macros
(ns clj8394.example
(:require [clojure.string :as string]))
(def numbers ["zero" "one" "two" "three" "four" "five" "six" "seven" "eight" "nine"])
(def number-map (zipmap numbers (range)))
(defn get-numbers-from-words
"Creates a vector of number values from number strings in the argument string."
[s]
(let [lowercase-str (string/lower-case s)]
(->> (string/split lowercase-str #" ")
(keep number-map)
distinct
vec)))
@jetrepilto maps are functions, so it’s unnecessary to wrap number-map
in #(get number-map %)
for static structures (like your map of words to numbers) I typically put them into a def
. I know that not everyone would
If I need it hidden/inaccessible, then the function definition can be in a let
block. Though it makes me feel icky to say:
(let [number-map ....]
(defn get-numbers-from-words [s] ...))
So then I would usually use:
(def get-numbers-from-words
(let [number-map ....]
(fn [s] ...)))
But for this sort of thing? I define the map with def
I usually put these static structures in let
to keep them within the same scope as the function, but it looks like typically in clojure these are at the top level of the files. Is there a reason for this? more performant?
map, but yes you’re right. But it looks like it’s run every time. And it’s a bad habit that can occasionally lead to things that DO get run each time (ask me how I know)
Besides, I have found that things that never change will often be useful in more than one context. Not always, but :woman-shrugging:
One of the reasons we can move them out to a top level def is that the map is immutable
break your function into three parts: a function that turns a single word into a number (a literal map will do nicely as a function) a function that splits a string into words (returned as a seq or vector) then something that uses the other two functions
People often confuse what an "abstraction" is. In programming, an abstraction is any partially defined construct.
A function is an abstraction for example, because it partially defines a computation, where the function parameters are missing their value. You can't run it yet until you call it and pass it some arguments. You would say the function is an abstraction, but once provided argument it is concrete and can now be executed.
Basically, everything that is a template of some sort is an abstraction.
Another such abstraction are Interfaces. An interface is an abstraction, because it defines a set of function definitions which are missing their implementation body.
In Clojure, sequence is such an abstraction, it is defined by the ISeq interface. Thus the Clojure sequence is an abstraction defined by the interface ISeq.
It defines four function definitions: first,
next,
more
and cons
, but their implementation body is undefined, so it's a partial definition for a type which support those four operations.
In order to make it concrete (that is fully defined, no longer partial), you need to provide an implementation for their bodies. This is done by the seq
function.
Each collection have their own implementation of ISeq and seq
will return the appropriate one for the given collection.
But this doesn't need to be an inheritance, so collections aren't necessarily sequences, there is not an inherent IS-A relationship. You could use composition as well, so seq
could return a type which wraps a collection and implements first, next, more and cons in way that uses the collection, but the collection itself wouldn't implement those functions.
This is kind of what LazySeq does. A LazySeq is a sequence, it implements first, next, more and cons. LazySeq is not an abstraction, it is a concrete type, not an Interface. (Side note, a class is an abstraction as well, since it's a template for creating concrete instances of Objects, so LazySeq is a class abstraction for concrete Objects, but it's not an Interface abstraction which is what we mean here).
What LazySeq does is that it wraps another sequence in a way that makes it lazy. So LazySeq is a concrete sequence implementation that takes any abstract sequence and can make them lazy.
So sequence is an abstraction. LazySeq is a concrete implementation of that abstraction. And each Clojure collections have a corresponding implementation for the sequence abstraction which is returned by seq