This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-04-17
Channels
- # bangalore-clj (2)
- # beginners (202)
- # boot (18)
- # cljs-dev (8)
- # cljsjs (7)
- # cljsrn (4)
- # clojars (2)
- # clojure (401)
- # clojure-boston (2)
- # clojure-dusseldorf (1)
- # clojure-gamedev (36)
- # clojure-greece (2)
- # clojure-italy (1)
- # clojure-russia (16)
- # clojure-spec (27)
- # clojure-uk (7)
- # clojurescript (68)
- # core-async (16)
- # cursive (25)
- # datascript (1)
- # datomic (34)
- # funcool (1)
- # hoplon (1)
- # interop (1)
- # klipse (1)
- # leiningen (2)
- # lumo (75)
- # off-topic (17)
- # om-next (2)
- # onyx (66)
- # re-frame (18)
- # reagent (2)
- # ring-swagger (11)
- # spacemacs (1)
- # specter (1)
- # timbre (3)
- # untangled (48)
- # yada (7)
Hello .. when I type the following in repl, it just hangs . why ? and what is repl expecting at that time ? thank you.
Try something like (take 5 (repeat "abc")) See the docs: https://clojuredocs.org/clojure.core/repeat
Or '(set! print-length 5)' to tell your repl not to print everything
but in earlier case , is the following correct understanding ? "repl called the repeat function which returned a lazy seq. repl is extracting items out of the sequence which never ends. so repl has not had a chance to return with the output".
prasadcljs: maybe a wizard can correct me here, but i would think the repl does not start extracting values from (repeat "abc"), because it has no reason to. so if it hangs, it's not because it's too busy. but i don't know why it hangs rather than returning the lazy seq.
I think that the repl does start extracting values from (repeat "foo")
, because it wants to print them.
For the details you'd want to look at the implementation, but conceptually that's what is happeneing.
@wistb fun fact, if you use clojure.jar instead of wrapping it with nrepl (as eg. lein and boot do) it will fill your screen with repeated prints instead of hanging. But with nrepl it will sit and hang and look as if it were doing nothing.
it's because nrepl is a network abstraction, and instead of being stuck on all the printing (which consumes the items) it gets stuck on the part where it generates all the values to be sent to the client side to be printed
@tagore Alright ok, ya, I've seen concatenation slow things down quite a bit, in log statements that happens frequently. And yes, using strings as generic data containers is a real thing. I've heard people recommend hashing IDs for example, so that programmers are not tempted to parse out things from them.
@didibus Yeah, I'm not talking about very subtle stuff here... I'm almost embarrassed to bring it up, but when you've seen a few important production systems just crawl because of things like this... well, I'm inclined to think that they actually represent the most common form of performance fault.
I think a few too many programmers of my generation took the idea that "programmer time is more expensive than machine time" a little too seriously- that's all well and good until you do something dumb, for no very good reason, in a tight loop. It turns out that making a habit of that sort of thing can be very expensive indeed.
The first would be "don't do obviously dumb stuff." That level of optimization ought to be generally expected.
Interestingly, systems at the top and the bottom of that hierarchy often have something in common- they lack a single bottleneck.
Ya, I think a competent and experienced programmer should be able to write smartly optimal code from the get go. Choosing the right data structures, the good enough algorithms, and building on top of the correct frameworks.
I wrote an app for a client a year or so ago where I had to write custom search and...
Premature optimization to me is a form of OCD. Programmers spend too much time for no business value. While choosing a data structure for your main storage is critical, pick the wrong one, and if it becomes a business blocker later, changing it will require a massive refactoring
I knew I'd never have more than a few thousnd items to search, and it made the code very simple, and obviously bug-free.
Ya, I think that's fine. Also, that search could probably be easily swapped out for a faster one if the need came up.
I recently ran into someone who was concerned about the performance of volatiles in their code
I've written more than one binary search, and they turn out to be surprisingly hard to get right.
I feel like that's the difference between writing performant code from the get-go and premature optimization
Yeah- well that's the second level of optimization I'm talking about... do smart things, like choose the right algorithm.
I've interviewed candidates who pick an array over a hash-map, when the question will be doing tons of key lookups. You're literally changing one statement from new Array to new HashMap, and using the hash-map will actually make the lookup a simpler implementation, as well as faster. So something like that too I would say is just knowing good performance tricks upfront, and not premature optimization
@tagore I'd say that case was somewhere between the first and second level, they were somewhat junior :)
@didibus Yeah- though I think it all does depend on the context of the system as a whole.
Like the linear search I mentioned- it was appropriate for that system, but would have been wholly inappropriate for others.
But that seems like it was almost you consciously choosing to prematurely de-optimize as a trade off for implementation of the algorithm itself.
if I have to hear another variant of "only the DOM is slow, javascript itself is blazing fast!" ...
@didibus You could put it that way- I think I was choosing to write a one-line function that I knew was fast enough for the use-case, and that was very unlikely to wind up being incorrect. Linear search is simple, and where it's adequate that simplicity has a value of its own.
Right, that's really what don't do premature optimization is trying to say. The advice should really just be have good judgment, and know why you are doing things the way you are, understanding the trade offs
Things that are dumb in one context are smart in another. Knowing the difference is important though.
A lot of people optimize not knowing why, and how, but just because they're kind of obsessed or are having fun doing it
Yeah, well also- you don't know what your bottlenecks are until you measure them, and it makes no sense to measure them until your system is close to complete.
so you can hypothesize where the bottlenecks might be (along with proper profiling of course)
I spent a few years working on some algorithms that solve a number of long-standing problems in the deformation of computer-animated characters. They're actually fairly revolutionary, in their way.
My first prototype was in Python, and likely between 10,000 and 100,000 times as slow as my current implementation.
I do not think I would have been able to devise these algorithms had ai Been at all concerned with performance when devising them.
To be useful they had to be fast, and I had to have faith that they could be made fast enough. That turned out to be the case, but just barely.
The thing is- the first order of magnitude speedup was easy- I had written the Python proptotype in a purely functional style, and it turns out that Python is dog-slow when you create a lot of garbage.
For instance, I was unable to find an algorithm in the literatue for finding the closest point to a given point on many bezier curves.
But that was very much my bottleneck at one point. So I came up with my own little space-partitioning algorithm for that, that while unlikely to be optimal, was good enough to move the bottleneck elsewhere.
And there were lots of cases like that- cases where I had to experiment with algorithms, but didn't have to think very hard about the machine.
But then.. I eventually got to a point where all the algorithms seemed pretty solid, and my profiling told me that 1) I was calculating too many square roots and 2) I was missing cache too much.
To get the final say, 3x speedup that I needed took something like 5x as much work as all the previous optimizations.
That's a very good reason to not prematurely optimize- micro-optimizations are very expensive for what they buy you, so you had better be sure you need them before you spend that much on them.
Also- be very careful about trying to beat the machine when it comes to square roots...
Anyway, sorry if that's a rather long diatribe... it's something of a distillation of things I learned trying to optimize something to the bone ๐.
Maybe OO thought residues are failing me: I wish to have a (generator) function that returns a function. The returned function should have access to a mutable cache of an index from a database, the index name being provided to it by the generating function. In an object they'd be encapsulated as a class together: the function and the mutable index cache. Now in a functional thinking paradigm, I can only hinge the index cache under a Var in a namespace in which my generated function/s live, hoping that I am not cluttering the namespace with too much (so much for sterile clean encapsulation), making that Var private. For complete isolation I'd need the namespace to only include my generated functions and each generated function's Var, and even then, different generated functions may access one another's accompanying Var. So for full isolation I'd need to create a namespace per generated function, to host only it and its generated private var. Would that be the most idiomatic / correct? How would I think better about this?
@matan I am not sure if I understand your request correctly, still. 1. mutable cache sounds like a typical usecase for an atom, which can be passed into functions as a parameter for instance (for a more DI style, look at component) 2. encapsulation and isolation is something you dont think that much about in clojure, at least not in the OO meaning in context to state. Clojures data structures are immutable so there seldom is a need for java style encapsulation. 3. Creating a namespace per generated function to host it and its private var sounds wrong to me and is not idiomatic, I would say. For starters, put an atom into a namespace and add the functions into the namespace that you want to use the atom with. Have the functions take the atom as parameter + what else you need.
@sveri you precisely and exactly understand my scenario and question. Your response about isiolation and encapsulation also touches interesting points.
@sveri in one clean design, my generator would generate an atom per function it generates; this atom, in a clean design, needs to live somewhere only accessible to the generated function (?!). Otherwise I could use one Atom for all generated functions.
Encapsulation would prevent generated functions using the wrong atom, not a concurrency concern but a code correctness concern. I am still wondering what would be idiomatic here for this. A generated function interning it's atom to its own exclusive namespace would have solved it, but I follow from your comment that is counter-idiomatic for clojure.
matan: I'm guessing you generate functions that do something, but you expect those functions to cache into a local atom? So that on subsequent call to them they use the atom as a cache to be faster?
Something else to consider is not to use a generator. Apart from the caching? Are the generated functions identical?
What you do after is instead of keeping track of a bunch of generated functions, you keep track of a bunch of atoms. This would work better if you were wanting to model say a class where many functions in the class share the same atom.
In fact, that's what you do in FP, you keep track of the data by itself, instead of keeping track of the objects. Than method calls are the same, except you reverse the order of the call, (method data) instead of object.method.
If you want to make it easier for people to understand what functions are meant for which data, you can define protocols and group your data as a record
But most of the time you can use the pattern of having a (defn make-data [...]) fn in a namespace, and all of the functions operating on this data in the same namespace. Where make-data returns whatever is best for you, map, seq, list, record, etc.
This creates a sort of open class, you can have many instances by calling make-data, (your atom cache would be inside the datastructure this returns), than fns for it are easy to find in its namespace, but other namespaces and existing functions are free to extend your class
and support your data if they want.
All fns are kept pure also in that way, since they take input and return output, making them easier to test.
I'd say, if the data in your case is only needed by one function, using a closure is fine, but once you start expending this and require more than one fn on the same data, the pattern I described is best. It is tempting to use closures as a poor's man object, but don't do it
@didibus great comments, I will go through your comments again to better grasp the compositionality trait that you point at.
> This creates a sort of open class, you can have many instances by calling make-data, (your atom cache would be inside the datastructure this returns), than fns for it are easy to find in its namespace, but other namespaces and existing functions are free to extend your class
and support your data if they want.
@didibus thanks for tipping about clojure.cache, I was not aware of it and it might be great to use it for other scenarios, especially that it's part of clojure core.
Had I used a def
for the atom inside the generated atom, would that in fact bind it to be only visible to the generated function it was generated in? I guess not, defs apply to the entire namespace. Let
would evaluate on every call to the function so it too, won't help. I don't believe ^:dynamic
would fit the use case, it will only rebind a new atom on every call to an already generated function!
@timok Beware that map is lazy, so if you don't use the returned values from the methods, they might not get called (in the repl they will, cause you use the values for prinitn them), If you want to do only side effects, use (run! #(.instanceMeth %) instances)
. mapv
is also not lazy.
Do we have in def
or other idioms something with the semantics of "define if not already defined, and otherwise get a reference to the already defined reference"?
I found https://clojure.org/reference/vars utterly bewildering in this last regard
@matan Don't really know what you are trying to accomplish, but if you want only a generated function to have access to the atom, you could use a closure:
(defn id-fn []
(let [count (atom -1)]
(fn []
(swap! count inc))))
(def new-id! (id-fn))
(new-id!)
well this defeats the purpose, as the generated function needs to be called infinitely many times, using the same atom every time
do we have something like def
that only returns a reference to the already defined atom, if one with the given name already exists?
Of course I could write one myself... but something tells me we already have something for that
But from what I understand about your usecase, you should just declare the atom in the namespace, maybe make it private.
yes, but my use case implies every generated function gets its own atom to chew on, so that will not be enough
In my earlier example, calling (id-fn)
will make a new atom, but calling new-id!
will keep using the same atom...
how is that? wouldn't the let
create a new atom on every use of the generated (new-id!)
?
as for isolation such that no generated function could ever "know" about an atom of a sibling generated function, I gave up on that; the namespace will be like a small cute junkyard of atoms
@matan I still dont understand the need for the "generated" functions. For starters I would advise you to take a step back and do what I said, create one atom for the mutable state and then create several functions that take an atom as function parameter and use these. The if you have that working you can go on and experiment with generated functions. Maybe you can also show some code then.
The call to (id-fn) will "close over" the atom, and the returned function will keep a reference to that same atom. So every time I call id-fn
, a new atom is created, but it is only called once.
@sveri sharing an atom will degrade performance, this atom will be hit all the time, whereas there is no real need to synchronize access to it among different instances of generated functions. I hope this answers to your question about why this is not my first choice.
@madstap Yep ๐ but that's not an excuse for not having a correct design in mind. I try to learn, not only quick-and-dirty through stuff.
Yeah, try it out in the repl, closures are pretty fundamental. Try invoking (id-fn)
twice, (def new-foo-id! (id-fn))
and (def new-bar-id! (id-fn))
.
@madstrap Yes I had it in the back of my mind that a closure should solve this, they usually solve scoping issues.... thanks for reminding me of that!!
hope closures bound variables are very safe in clojure in all normal circumstances ๐ I can vaguely recall nasty closure issues involved in some Akka or concurrency scenarios, in the world of Scala
@matan that's when you try to do silly stuff like ship a closure over a network connection. Close over 2GB of data and suddenly you have to transfer that with your closure.
Thankfully though Clojure recommends shipping data, not code, so works out fine.
@tbaldridge thanks for the extra perspective! that does ring a bell ๐ actually
@tbaldridge what did you mean however by >Thankfully though Clojure recommends shipping data, not code, so works out fine. data might be larger than code I mean
Yes, however there's a few things that are different in Clojure. Firstly Clojure doesn't execute code lists (sexprs) directly. It compiles them to Java bytecodes and then executes classes. So you now have a problem of how to transfer java bytecode over a network connection.
@tbaldridge how is transferring bytecode worse than anything else actually? it is uniform on all platforms at least...
Secondly, there's the problem of code context. What does (foo 4 33)
mean? Well it depends on what foo
is defined as in the current namespace. So that means you need to transfer foo
as well, along with the name of the current namespace. But what if foo
calls bar
? Well now you have to transfer that as well....
Yes, this concern of shipping code over a network is well known to me, and could be a pitfall working with closures, indeed.
Compare the complexity of all that with simply transfering {:name "bill" :client/type :online}
Yeah well, of course. Clojure is not a distributed execution thingy. Last someone tried doing something like that, it didn't turn out so nicely in most cases ๐
Right, and one could argue that any language that tries to ship code is going to be full of pitfalls.
Unless designed to do exactly that from the start; there's an interesting idea for a new programming paradigm for ya. Would be vary mathy to design a language for that
I was recently talking to a Erlang programmer who said in their first job working with the language the team told him "yeah...that whole distributed-hot-code-reloading thing? No one actually does that in production."
@tbaldridge Laughs aside I hope the same won't be said ๐ about redefining functions at runtime in clojure. How well do you find that is well designed for concurrent scenarios, in the case of clojure? It is by the way somewhat saddening that clojure has little to offer for remote atoms and stuff (or does it?). One machine isn't the world..
@matan no I think here's fine. Clojure has always had a pragmatic view of concurrency and distributed programming. To that end I think Clojure does quite well. You use atoms, vars, and channels, locally where the semantics of the system match these primitives. And across machines you use RPC calls and Queues, because the distributed nature of the system demands it.
In short...why would I want distributed atoms, when I have Kafka? Or put another way: a local machine supports locks, sub nanosecond response-times, and the relative assurance that if one core is working, they're all working. In a distributed system, machines can come and go, networks can be slow, and locks are impractical (or impossible). In that case you need a different model. But there's no reason to say "everything must be a queue" any more than "I need STM across machines". Use the model that fits the system.
@matan Of course I dont know about your performance requirements, but I remember having had an atom that was updated every 10 ms and contained several 10 of thounds maps without having noticed any performance issues.
@sveri I should later make some measurements. My scenario is similar to a logging scenario, and buffering can alleviate much of the performance concern. What did you use for profiling to find a 10 ms pattern in your code?
I was connected to a service via websockets that send updates in that intervall IIRC and I put the updates into one atom. Of course, I also read from that.
In general the JVM is very fast, so when I write code where I dont have specific performance requirements I just write ahead and take care of the readability of my code. Later on I can still profile, if there are problems.
*my initial concerns were more about isolation than performance, just for being defensive for a moment. Not very different though ๐
in that vein this pattern is very common:
(let [a (atom 0)]
(defn next-id []
(swap! a inc)))
it's fast, thread safe, and just works
@tbaldridge you are right, each distributed application needs a model that suits its type of requirements and trade-off its designers wish to make between the conflicting concerns of distributed computing (resilience, performance, dynamic topologies, devops & elasticity and api simplicity, to name the key ones I can think of).
A programming language may only be a facilitator for any implementation of that sort, whereas asynchronous idioms it may have/lack will affect how well it may accommodate a certain distributed design. I hope core.async gives a suitable foundation for future developments in that area ๐
@timok I sympathize with this quote from https://github.com/ptaoussanis/timbre > Java logging is a Kafkaesque mess of complexity that buys you nothing. It can be comically hard to get even the simplest logging working, and it just gets worse at scale. But have not used it yet. One annoying fact is (I think) you'd still need a log4j outlet for the transitive set of libraries you use, to dump into.
Not sure if this falls under #beginners but I was trying to some testing with stuartsierra/component
to make sure I understood what I was doing and came across some odd behavior that I'm not sure of.
I effectively have the above component. The "test" and "?" strings print to console but the counter from 1-10 do not? Am I missing something obvious?
@mbcev for
creates a lazy sequence that will only be realised when consumed, which it isn't in your example (https://clojuredocs.org/clojure.core/for). Try doseq
instead (you can skip the explicit do
in that case)
prasadcljs: maybe a wizard can correct me here, but i would think the repl does not start extracting values from (repeat "abc"), because it has no reason to. so if it hangs, it's not because it's too busy. but i don't know why it hangs rather than returning the lazy seq.
@mobileink it does because printing consumes the values
it hangs because it is being realized server side for nrepl, if you use vanilla clojure without nrepl, it will just spew endlessly to the screen
@mobileink yes, but first it needs the data, and nrepl fucks with that in this case
using clojure.jar it fills the screen with printouts, with nrepl (that is, what leiningen, boot, etc. all use by default) it stalls server side waiting to generate the data that the client needs
(arguably thatโs bad and they should stream partial results, I donโt think that fits nicely with their protocol model)
unless you asked the app to print the value