This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-08-06
Channels
- # babashka (101)
- # beginners (47)
- # biff (7)
- # calva (36)
- # clj-kondo (19)
- # clojure (11)
- # clojure-europe (1)
- # clojurescript (4)
- # conjure (4)
- # core-typed (3)
- # cursive (24)
- # emacs (22)
- # events (4)
- # gratitude (1)
- # introduce-yourself (1)
- # malli (27)
- # meander (5)
- # off-topic (101)
- # portal (5)
- # shadow-cljs (26)
- # tools-build (4)
- # tools-deps (3)
- # vim (8)
- # xtdb (13)
Typing this after a loooong-overdue OS update to the little thinkbox. I went up this path: Drive backup -> Ubuntu 18 LTS -> 20 LTS -> 22 LTS via the OS's own software upgrade tool, and things are ... "just works" working. In fact, stuff that wasn't working is working now (e.g. bluetooth devices that the box wouldn't connect to). And I have a renewed appreciation for software maintainers. 🙏
Haskell monads are "computation" or so this post that's hot on HN suggests: https://www.micahcantor.com/blog/monad-confusion/ I'm curious what people think. This way of viewing the relationship between the languages is one I have been using for a while as i tried to bridge the gab between more complex type systems (in haskells) and what clojure was giving me. Though I never had the confidence in haskell to make the argument. Usually I was doing this in reaction to people telling me that they couldn't imagine coding without types. I think we (the software community) need to abandon, or at least rethink, the terms dynamic and static typing, they really don't serve our purpose, delivering programs that behave correctly. Clojure suffers in some ways because I haven't been able to adequately explain how it compares and improves on other things i have tried. The author makes an argument that Haskell helps you avoid run time nil pointer exceptions because the equivalent Nothing return values/types have to be properly wrangled. They graciously admit there unclear what this trade off means: > That's not to say Haskell's approach is strictly better however, since the overhead of managing the types of complicated monadic actions may not be worth the runtime safety it provides. And neither do i really, I think that's sort of the problem. When there is ambiguity, people will error on the side of rhetoric which seems less risky. You can see this in games of chance all the time where people will make irrationally risky bets over a distribution of events that seem safe to them in the short term. Typed languages use words like "safety" as if people were going to get more hurt by a program that throws a nil pointer exception and messes up your bank account, then some other type of exception or behavior that also messes up your bank account. p.s i don't think clojure's when or if function being a macro matters to this argument at all. Thats just a performance improvement right?
How is this not a Java macro? The only difference is that you can't add it to the compiler yourself.
for (type variableName : arrayName) {
// code block to be executed
}
They say that Clojure's macros are ununderstandable while instead you're given these magic forms that you memorize. And then you're told that this is surely not a macro because you can't write a macro in Java yourself.
And if programmer is not aware that this is a macro then "Clojure/LISP has macros but Java doesn't". But when they think that in both cases they use macros then the argument about macros is useless to talk about.I don't really follow the question Martynas. what programming language is that code from? What argument is useless? Did you mean to reply to me, i'm not sure i see the connection between what i said and your comment 🙂 .
I only wanted to address your last sentence where you mentioned macros.
This example is from Java: https://www.w3schools.com/java/java_foreach_loop.asp
Also that for
loop is actually a series of "`when`" calls where it checks if the item exists in the Java's Iterator.
One strange difference between Haskell and Clojure is that Clojure can run new code at runtime (as it has evaluator) and Haskell is compiled away (I think so).
So Haskell's Monad-based when
remains pure in runtime but Clojure's may happen to get a new type of unexpected argument (gremlin connecting via REPL to prod and sending to execute (throw nil)
or even redefining the when
macro to send him all the new code that is executed).
Also it's most possibly the case that in Haskell the if
inside the Monad
will be compiled away completely and straight code paths (no ifs) would be extracted.
So then it becomes only the syntax thing to even have Monad
because it won't be reevaluated at runtime.
In Clojure you can also have that by using values in top-level macros:
https://inflambda.tech/post/2022-05-01-clojure-compile-time-demo.html
But in Haskell you have to use a "metalanguage":
https://stackoverflow.com/questions/2475828/haskell-compile-time-function-calculation
So why not compare Haskell collection Monads to Clojure sequence monads instead of comparing complicated statements.
For me it's similar but something is not right.
I didn't read the linked article, but Monads are orthogonal to static/dynamic typing. Clojure has Monads as well, quite a few libs that provide them. Is the question: is it safe to use Monads without static type checks? Or is it that without them the risk for human error in their use that make it to prod is too costly?
That article mostly talks about Monads in Haskell and LISP is somewhat a bonus. And then the author found that he can use Clojure's when
macro as a Monad with if
inside. It works for this simple example but then a function would work too.
And also I now see that he doesn't consider that he returns code from a macro and a result from a Monad.
Also Clojure code uses macro invocations that return other code and those use ye olde Java Objects: https://github.com/clojure/clojure/blob/b1b88dd25373a86e41310a525a21b497799dbbf2/src/jvm/clojure/lang/LispReader.java#L285
So well... if it's stateless and returns something then... it's a Monad...
I started reading the article. I think I would disagree with their distinction between data and computation. A first class function is not data, an object is not data. At the point where a function is passed as input, the only thing one can do with it is run it, or pass it around further. There is no data to manipulate or inspect. Maybe there is metadata attached to it you can read, like a doc-string, but the function itself is not data anymore.
> But with first-class functions, we can instead use functions as data — as objects that may be passed around like a string or an integer > So this premise is false. When in Lisp we say code is data, we don't mean this at all. A string or an int (lower case) is a data type that represents data. Simple bits which were encoded using a specific convention that the type we attached alongside the bits tell us about. It tells us the rules about this data so we can understand it and manipulate it. A String and an Integer (upper case) are a data structure. They're computations wrapping data. They let us manipulate the underlying data indirectly through their methods. A function is not data, nor data with a type, nor is it a data structure. It's not even data that represents a function. It's a concrete construct of your runtime that can be used to run an already defined computation, but the definition of it is no longer there for you to manipulate. That definition is the data representing the computation, but it's been striped away at this point, no data remains.
You can use functions to make data-structures. So a particular function could encapsulate some real data and let you access it for example. But that's a whole different thing.
In Haskell, you can attach types to functions as well. I think this confuses people, the word type does not mean data. The type is data about the thing, a form of metadata actually. That thing doesn't have to be data. It could be a data-structure, or a function, or an object.
Interesting ideas didibus. I guess i have two goals when ever i get close to Haskell, one is seeing if there any useful abstractions that i should be more aware of. And another is to understand how to communicate with people in the typed community. I'm not sure the author was too concerned with being specific in his terminology, and to that point, who gets to deiced what things mean? That's exactly why its useful to have discussions like these where people try to bridge the gap. Clojure does allow for functions, strings and integers to be passed to other functions. what is the general word for things that can be used? I would have said data to. Having fun with this a bit, i would say that the difference between meta-data and data is very meta.
I tend to call that "first-class". Something is first class if it is available itself to be manipulated dynamically at runtime.
right, but isn't that name coming form OO languages that have classes?
classes more in the drivers seat i suppose.
I can't say what the etymology of it is. But I don't think so, I feel I've mostly heard it I'm the context of functional programming. I don't think the "class" bit refers to a OOP Class either. I think it's used more generically as, this is a reified concept, the thing exists at runtime basically. It's not stripped away when compiled.
The https://en.wikipedia.org/wiki/First-class_function#:~:text=The%20term%20was%20coined%20by,functions%20is%20a%20standard%20practice. was "first class citizens" and it was "first class functions" in full. So if we drop the "function" bit to include strings, we just get "first class", but know we (anyone using the term) lack the context. e.g What's second class exactly?
Anyways, I agree with you. The terms don't matter, but the concepts do. So it's not about arguing what we call what. Think unsupervised models, you can still find groups of things that share common properties. Understanding those properties and where there are similarities is objective, and it doesn't matter what name you give those categories or where you choose to put the granularity for them.
So take a function, at runtime, in most languages, that's not something you can serialize to disk, that's not something you can send over the wire, that's not something you can have a human look at and understand what it does, etc. What I'm describing here are key properties of something else, something that a function isn't, because it lacks those properties. I will choose to make those key properties of something I will call "data".
Now fair enough, someone might say, ok but in functional languages, it is something you can pass around as input to other functions or return as output. Ya that's true. And maybe you choose to call things with that property data. Is that the only property required for you to call it data, well that's up to you to decide, you call it what you want, but it doesn't magically gain other properties because you chose to call it data.
Fair enough. I'm overly invested in this kind of word play because it's social in nature, so it's fun to talk about 😆. In particular, I think the terms strong and safe are erroneously used often enough I have had to develop ideas about what people really mean. Like i arbitrarily said that half my team mates were going to be the "second class devs" and i'm a "first class dev" and what that means is that i'm the first person to take the devops shift. It wouldn't be a very good name, and in fact they would have every right to be upset by it because of the obviously confusion it cases in communication.
So i'm always tempted to just raise the stacks. Something like, "clojure has Immutable brawny types" oh your types are just strong? ugh, my types can bench like 500 lbs.
this probably says more about me then it does the wider community ... 😅
Haha. Ya, I find it fun too. And I don't think it's unimportant. It's important at a layer above, when you bring the human in the loop, your choice of named concepts matters a lot, both for communication and understanding each other, but also they form a bias in your reasoning. It's hard to reason about concepts you didn't give a name too or that you don't discuss often enough to remember or have built instinct around it.
Anyways, if you want to understand Monads, I recommend you play with them in Clojure first. Once you got that understood, you can go see how Haskell manages to statically type them.
Haskell works with two big constraints: 1. Every side-effect must be defined at the program entry point. 2. All code should be statically type checked. Because of #1, it requires some relatively clever code style/patterns, where Monads came into play. Because of #2, it requires some relatively sophisticated type system, but it also means that if some code style/pattern is too hard to type check with today's type systems, then it will simply be disallowed to code in that style.
Do those constraint help with program correctness? Well, this is an interesting one. In the average, from the data I found, yes, it does a little, but not any more than the alternative set of constraints imposed by Clojure, which is functional first, immutable data, use a REPL to develop, etc. The data appeared to show a measurable (though quite small), reduction in average software defect for functional over procedural/OO languages, and further it showed another small but measurable reduction in average defect for static over dynamic, except Clojure did not follow that trend, and had a similar reduction in defect as statically typed functional languages. It wasn't clear what could be attributed to it, but we could hypothesize the immutable default, or the use of a REPL, or the combination of those, or something else unique to Clojure.
Now the data didn't discuss other attributes. So like what were the trade offs here. Did Clojure program require more time, say in writing test and manually testing things, to achieve that same reduction in correctness? So even if it's equally safe, might be it takes more effort to arrive to it? Or vice versa, did Haskell cause issues in productivity? And also missing is the trade made to performance? Etc.
Ok, I read the article till the end haha. Ya the last bit is more interesting. They're right that the Monad allows the Haskell type checker to assert certain things at compile time that the macro doesn't. That's why Haskell chose to use Monads. Maybe there's ways to type check over a macro as well, that I'm not sure. But the Monad doesn't need the type checker, and can be implemented in Clojure as well.
One interesting approach to take with Haskell is that the entire language is data, not code, with rewrite rules. If you think of it as a graph reduction algorithm, is Haskell data or code? Since it's fully lazy, you don't need macros to create new control flow structures. Everything is evaluated on demand. Thinking about monads in particular, I find it useful to think about them as the inverse of generics. Generics are a concrete container parametrized on the content. Monads are generic containers parametrized on the container. Then slap an interface all these containers satisfy and you can "safely" compose computations. Without graph rewriting and optimization, you're better off using macros than monads in Clojure
For me, Haskell constructs don't meet my definition of data, so it's definitely all code. Monads are just higher-order functions with some syntactic sugar. The syntactic sugar in Haskell cannot be implemented without direct compiler support. That's because in Haskell code cannot be manipulated like data, so you can't do anything with it that the compiler doesn't give you. What you can manipulate in Haskell are higher order functions, which neither meets my definition of code or data.
For me, code is the token representation of a program. That's not something that in Haskell you get to manipulate as a data-structure. And data for me is an arrangement of primitive types, it's passive, doesn't do anything, it can be serialized to disk, to a human readable textual form, sent over the wire, and can be fully modified within the rules of the information it represents, you can change the values in it, etc.
A lot of people seem to want to say Monads are similar to macros. They're not.
Some Monads can be used to similar results then macros. But Macros can be used for a lot of things Monads can't do.
If Haskell's do
was also a Monad that be different.
If you look into Haskell's execution model, you see that the way code is evaluated is like data. You don't need macros for new control flow where a definition like: or x y = if x x y Doesn't diverge Monads are not just higher order functions. They are an implicit context to the computation which gets threaded along. You don't need to manipulate code like data because you can manipulate code like code, and part of the compilation process involves rewriting (eg partial evaluation), which is exactly what macros do.
a partially resolved lazy seq: is it data or computation ? the mistake of the author is saying that it boils down to "functions being first class". When i think it's the combination of "value oriented programming" and lazyness
I'm not sure what you mean. Bind dispatches on the type, and just uses higher order functions. I guess yes it combines to create a behavior that isn't the same as randomly using higher order functions, but it's kind of just a structured use of higher order functions and combining them.
> a partially resolved lazy seq: is it data or computation ? > In my taxonomy, this is computation.
So, sure, you have a data-structure of computations. But I don't consider all data-structures as data. The data-structure needs to contain and let you manipulate data for a data-structure to be treated as data itself. In this case, we don't have data inside it, we've got opaque executable functions, all you can do is invoke those functions. The order you invoke them could be changed if that depends on the order of them in your data-structure. But with Monads this isn't even true. You actual don't even have a data-structure of computations. You've got computations returning computations. After a first bind, you can't go back and change the order of what was already bound. So it's even more opaque then a list of functions.
Though, maybe it works differently in Haskell. The compiler does substitute certain things for IO, and I admit I'm not sure how it goes about doing that.
> And data for me is an arrangement of primitive types ... @U0K064KQV
>
"Primitive" could be defined many ways. A brief perusal of Wikipedia shows a reasonable definition of https://en.wikipedia.org/wiki/Language_primitive, which directly conflicts with your an assertion that data "doesn't do anything".
If you specifically mean https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html as defined by Oracle in the Java docs, then what happens when you cross language boundaries? Does something switch to/from being data based on the types the language offers?
And what about your human-readability requirement, below? A byte[]
is a simple linear arrangement of the "primitive" byte
. Do you know anyone who can "read" an arbitrary list of byte arrays?
> ... it's passive, doesn't do anything ...
>
A function must be called to make it "do anything:" Until then, it is every bit as inert as anything classically recognized as data. More specifically, functions don't do anything until you give them values. Values don't do anything until you pass them to functions. Behavior is emergent from the combination of the two.
> ... it can be serialized to disk ...
>
(instance? java.io.Serializable (fn arbitrary-fn [] nil))
;; --> true
Thanks to Rich graciously providing Serializable
implementations, most Clojure functions make perfectly good candidates for ObjectInputStream
and ObjectOutputStream
.
> ... to a human readable textual form, ...
>
So I take it you do not consider image blobs to be data? Certainly no readability there. But where would we be without databases to the brim with what is commonly called "image data"?
Password hashes are specifically and intentionally designed to be human un-readable! Can you keep a straight face while trying to argue the a db table full of hashed customer passwords is not data?
If passwords get an exclusion from this readability rule, then why should a table of serialized Clojure functions be any different? At least a fn serialized down to a string offers some meaningful clues to it's contents.
;; the `arbitrary-fn` example from above
"��srclojure.core$inca\nc֕�xrclojure.lang.AFunction>p��F��L__methodImplCachetLclojure/lang/MethodImplCache;xpp"
> ... sent over the wire ...
>
We already determined we can serialize a function, so ....
> ... and can be fully modified within the rules of the information it represents, you can change the values in it, etc.
>
Well maps are immutable, along with lists, vectors, etc. Strings and characters are immutable. Numbers are immutable. So what exactly are you talking about when you say "fully modified" and "change the values in it"?
I will assume you are not trying to say that "transients are data, and mutable Java objects are data, but all else be damned."
So the only reasonable interpretation I can make is that you mean "you can derive new data from old data".
And sure, if you pass inc
a value of 1, you can derive a new number of 2. But you can also derive inc
itself from the function +. If you provide - instead of + , you derive a very different new function.
(def two (inc 1))
(def inc (partial + 1))
(def dec (partial - 1))
So, what about derivation from a collection of numbers do you claim makes it unique from derivation from a collection of functions?I think we miss the difference between actual functions and function pointers.
I'm pretty sure that you can't serialize a partial
function because serialization serializes only the pointer:
((partial + 1 2 3)) ; => 6
(str (partial + 1 2 3)) ; => "clojure.core$partial$fn__5912@5e5585de"
(clojure.edn/read-string "clojure.core$partial$fn__5912@5e5585de")
; eval (current-form): (clojure.edn/read-string "clojure.core$partial$fn__5912@5e5585de")
; (err) Execution error at my-ns/eval81654 (form-init13929679240972553784.clj:431).
; (err) Invalid constituent character: @
Which basically means that if you curry a function, then send it through wire and deserialize it then you only send a pointer title, not the whole function with the whole "banana that is held by the gorilla in the whole jungle of OOP".
But if you make a symbol then you can sort-of do this if you include all your functions and define them at the place where you deserialize (let's not think about security for a bit here):
(clojure.edn/read-string (str '(partial + 1 2 3))) ; (partial + 1 2 3)
And now you can try to evaluate it. But it's not the function then that you send but the symbol representation.
In Haskell and OOP languages you send function pointers around (in same VM) to not send "the whole jungle" (also there is this "whole jungle" in the RAM already so there is no need to duplicate it). So if you would try to serialize Haskell's function you would get the same pointer string (if it would compile).
And this is what people say when "functions are not readable once they're instantiated" because you can't easily take a pointer and figure out what code it points to. Maybe you would want to make a runtime that allows you to inspect your memory and then decode the compiled bytecode from it :thinking_face: Maybe that means that bytecode IS the weakness and it should be infinitely inspectable?
Also yes, function pointer title string is data... but what is the use of that data? It's a handle into a memory which doesn't exist when you take it out of VM's context.@U028ART884X I am not talking about pointers.
(defn ->base64-str [x]
(.encodeToString (java.util.Base64/getEncoder) x))
(defn serialize->str [object]
(let [buff (java.io.ByteArrayOutputStream. 1024)]
(with-open [oos (java.io.ObjectOutputStream. buff)]
(.writeObject oos object))
(->base64-str (.toByteArray buff))))
(def str-encoded-partial
(serialize->str
(partial + 1 2 3)))
then on another system
(defn <-base64-str [s]
(.decode (java.util.Base64/getDecoder) s))
(defn deserialize<-str [string]
(with-open [ois (java.io.ObjectInputStream.
(java.io.ByteArrayInputStream.
(<-base64-str string)))]
(.readObject ois)))
((deserialize<-str str-encoded-partial) 4)
;; --> 10
So you used java.io.Serializable
. But how does that allow to edit/inspect the source code after you deserialize it?
Because it's the nature of JVM to allow that and not nature of Clojure. Maybe we should compare Java to Haskell instead?
Source code is already a string. If the data you want is source code then just send that.
If the data you want is the function itself, then serialize the function as demonstrated above. If you need to "inspect" it, try clojure.reflect.reflect
. If you need to "edit" it, then just like with anything else in Clojure, you pass it to a function that uses it to derive some new thing.
> Because it's the nature of JVM to allow that and not nature of Clojure.
Heresy! 😄
One of the tenets of Clojure is to embrace the JVM. We don't need to reinvent object serialization, because the JVM already provides the facility. We just need the underlying classes in Clojure core to provide correct Serializable
implementations, which they do.
If you try to analyze the sent binary then you do disassembling and that's probably a way to find out what it is. There are numerous libraries that allow to do that but it's not a human-readable form. We have it in Java and even if it works it tends to get unwieldy really fast. There are testing libraries that do that (Mockito, PowerMock).
But is it the primary way that your code should be executed? Even if you can do it should you do it? What about knowing what code you execute? Java can have classes that you could send this same way and you could execute any code in constructor, so why is this different this time? On top of that the partial
function is implemented in a similar way where it's an object of a class.
And if we send a string to be compiled we could also send Haskell's sources to compile and run. And we could also send Haskell's binaries too, if you want to compile some code and run it.
With Clojure we have this compiler at runtime which is great.
But then how does this contribute to function being data? What about .exe
or .so
files? Are they also usable human data? There are people who analyze them and do no-CD cracks or next version of stuxnet. This is well... data... Compiled, but data. But how usable is it?
I think that even though everything can be converted into ASCII it doesn't mean it's usable. Maybe on some context yes. I think that for instance for the JVM an Object is data. But for a human... no. I.e. if you don't have tools to know what it is then it's unknowable.
Yep. I totally agree, you might blow the stack if you try figure out where to draw a hard line on what is or is not data. My main point is that the line @U0K064KQV has chosen to draw is arbitrary, and the requirements used to validate that choice don't really hold water. That matters because willingness to think of functions as data opens many opportunities for elegant solutions to problems, which aren't available if you insist that they cannot be data. The movement of "functional programming" is built primarily on the idea of "first-class functions", which is to say, in short, that you can pass around an unrealized (or partially realized) process just as you would any other data type. Let's embrace that. The foundational model from which fp was derived is lambda calculus, where there is only functions, and data can only be emergent from function application. So why would we then turn around and say functions are not data?
> The author makes an argument that Haskell helps you avoid run time nil pointer exceptions because the equivalent Nothing return values/types have to be properly wrangled. - @U0DJ4T5U1
Finally got around to reading the article. The author conflates nil
in Clojure with the Java null
underneath it. In Java null
is really nothing, while in Clojure it represents logical falsity. This gives us built-in support for nil-punning, obviating the need to "handle" nil
at all in numerous situations. You must always deal with the possibility of a Nothing
inside a Maybe
.
> being a macro matters to this argument at all
I agree, the macro is just sugar. He's really comparing when t $ do a b
to (if t (do a b) nil)
.
> That's not to say Haskell's approach is strictly better however
Is this what you are asking about, rather than the rest of the article contents? Seems like your points aren't much related to anything except this statement the author made somewhat in passing.
@U90R0EPHA I'll have to disagree. Data is not mystical, look at the assembly, instructions are executed by the CPU, there's a limited set of instructions the CPU can perform, those are "computations". Things that build by composing those together to be executed in a sequence of them are also therefore "computations". And then you have data, which is what those instructions operate over, it goes in the registers and gets stored in memory. Computations can also be described using data, at which point they can now be operated on by those same CPU instructions, they're now allowed in the operand. A Monad builds up a computation, once built, you have a computation, not data. You cannot unbuild it, or change it, just wrap it some more into a bigger computation or execute it. Absolutely agree, understanding that something can be first class, like a function, opens you up to new possibilities, like using functional programming patterns, such as Monads. What I'm saying is there's something else you should also recognize and understand, which opens you up to new possibilities beyond that of first class functions, and that's data that describes computations, such as are quoted Lisp forms. I do not like to call both these things the same thing, because it won't give you the ah ha lightbulb moment, if you only limit yourself to see data as first class functions, you will fail to learn about all the possibilities that code as data opens up beyond what first class functions can do. And I don't mean one is a superset of the other in terms of possibilities. They both open different new design possibilities and have some that overlap a bit.
I didn't know Clojure functions could be serialized like that... I'm kind of surprised, but I actually think it doesn't work. Did you try deserializing this in a separate app where the function doesn't exist? Maybe it does, but I'd be really curious how it did that.
Ok, so if you followed my thoughts, when you say:
(def inc (partial + 1))
What (partial + 1)
returns is not data, unless Clojure also attaches to it some metadata which contains the code itself (even if tokenized).
Sure, you can compose this inside other computations. But you can't modify it, data should be modifiable.
For example, you cannot change +
for -
You cannot change 1
for 300.
You cannot add more things to the call to +
You cannot change partial
for reduce
.
So you can't change anything about this computation. If the computation was represented as actual data, you could change all that, for example:
(def inc (quote (partial + 1)))
Now you've got data 😜
((eval (concat inc [300])) 100)
;;=> 401
And like data, you can fully manipulate it. And if you think your data represents a computation, you can lift it as such using for example eval
turning it into a function, and now it's a computation and is no longer data, but you can now execute it, as you do with computations... or yes, because functions are also first class, you can also wrap it in a other function or pass it around to another function to call, but that's not the same thing as data, because as I've just shown, you've lost the ability to modify it.I tried to deserialize some things between different JVM versions.
When I open JDK11, serialize a `java.io.Date` java.io.Date
and its hashCode, then deserialize+run on the JDK11 I get this:
[#inst "2022-08-12T05:49:41.615-00:00" 1729904998]
When I serialize on JDK11 and deserialize+run on JDK18 then I get this:
[#inst "2022-08-12T05:51:52.266-00:00" 1456208737]
As the hashCode
of the `java.io.Date` java.util.Date
is different it means that the class has changed between executions. So the class is changed between JDKs. So it serializes links to classes but not the whole "jungle".
Right, but even then, it just serializes the data in the class, but the code for the class exists in both JDKs. To serialize a function, as-in, the function itself, and not just like an instance of one, like the data in it's closure along with it's qualified name, you'd have to somehow serialize the code for it. I really don't think Clojure does that. I think maybe it just serializes the fields and the class name of the function, and then when you deserialize it just recreated an instance of the class of the same name and populates the fields. Which is why the "class", aka the code, could be different when you deserialize. But also it means the function was not truly serialized, but only enough info to create a new instance of it if the function exists.
Right, so i think this conversation shows it's very hard to define these terms in a way that is precise and meaningful. Now add "types" into the mix and you see the problem. By talking in terms of data and code and not types and code we have isolated ourselves on a different semantic island. Do you really like our island is that different than those dealing with static types? Do they not have data? I suggest we (the wider clojure community) change the narrative around data so that it's connected with the conversation about types, that way we can get bridge the gap for people coming from that mindset. First pass on this idea could be saying we have "data types" / don't call us dynamically typed. This is hopefully a complementary idea to the great discussion going on here. :).
I like your point. And I guess there's two conversation happening here at ounce.
1. Language as a means to communicate and reach out to others.
Here I can agree with you, Clojure has maybe a different use of data, though Rich Hickey claims it's the actually correct use of data and it's other programmers who have changed it's original meaning in weird ways: https://news.ycombinator.com/item?id=11945722
I don't have an opinion here really, if there are other names for concepts and ideas that other people resonate more and it helps for me to learn those so I can reach these people and discuss those very ideas with them, I'm willing to call them by other names
2. Constructs and concepts and their properties and understanding their distinctions and relations and implications.
This is where I think I disagree with some in this thread. There are real concepts here, those aren't fuzzy, what's fuzzy is that some concrete constructs could be designed in ways that combine all these concepts together, but the concepts are absolutely legitimate real concepts that I think is worth understanding. The first path to learning is to be able to recognize and isolate the concept in the first place.
You'd realize this strongly if you had to implement a construct around one or more of these concepts.
Here I'll still call them: code, data, computation, types.
Maybe you want a different taxonomy, that's fine, but we can recognize four different concepts.
In my opinion, if you think some of these things are the same thing, you've not understood these things well enough yet. Is it slightly confusing and they seem to have similarities and are often all used together? Absolutely, and that's why I can understand mixing them up, it similarly took me a ton of time to stop mixing them up and really start to understand each one and then be able to distinguish one from the other.
Am I surprised that someone coming from a language that doesn't have a strong concept and usage of data, and code as data, that they'd be confused about it and think that first class functions are the same thing as that? Not at all, unless you have a chance to experience it, it's totally reasonable to assume they're the same. After all, you do have part of the logic as data, the types are used as data to branch on and build the composition of functions based on the Monad definitions.
You might not even understand how you could have a Monad without types, but in Clojure you see that you can pass the "Monad to use" as an argument, you don't have to attach the "type" onto the functions, that's why the domonad
macro takes the Monad to apply, versus in Haskell that's inffered from the type.
And saying all this, I'm aware I might sound like I'm implying I'm smarter or anything like that, and I don't mean that at all. There's much I don't know that I'm sure I can learn from others. And even here I'd be happy to be shown wrong and I'm sure it's possible, there's still much nuance and details I'm still missing I'm sure. But I can find another way to say: No it's not just semantics, there's actual concepts here that differ, it's not my opinion, it's a practical truth, these are not the same things, are they isomorphic? Maybe, I won't claim to have a mathematical proof they might not be, but they're definitely different in lots of important and impactful ways.
It's like when someone says an Object and a Function are the same, they're not, they can be used in some cases to similar effects, but are also very different in many qualities that are impactful.
If someone kept insisting that a function is just an object, maybe even point out how a Clojure function simply compiles down to an Object, and that really I should simply start to call functions Objects and think of them as Objects, well I would not agree, and similarly here 😝
But, maybe the article author does understand all this. That's possible as well, and they simply call these concepts differently. Which is where it gets tough haha. You could say anything that can be input to a function is "data". And sure you can say that. I still wonder then what do you call what I call "data" then, what do you call bits arranged with some implicit encoding that is meant to represent a fact or information? And then how do you distinguish between the kind of input that you receive which all you can do with it is execute it or compose it or pass it over to something else, and the kind of input you can also inspect, reify, modify, and all that, and that you cannot execute?
> there's a limited set of instructions the CPU can perform, those are "computations". Things that build by composing those together to be executed in a sequence of them are also therefore "computations" ... And then you have data, which is what those instructions operate over, it goes in the registers and gets stored in memory. - @U0K064KQV
Anything more complex than individual "instructions the CPU can perform", must also be stored in the registers. There is no "composing those [instructions] together" without in-effect serializing those instructions to store for later.
> A Monad builds up a computation ... You cannot unbuild it, or change it, just wrap it some more into a bigger computation or execute it.
You can store it, or pass it along, or drop it, or wrap it, or execute it. If you execute it, you can then store the result, or drop the result, or wrap the result, or execute the result (if result is executable). Or instead, you could use the result to decide whether to store, drop, wrap, re-execute the original monad. If you have a collection of these monads, you can execute some and not others, you can drop or replace or pass along some, but not others. A lot more choices here than just wrap or execute; and all those choices cross with the infinite pool of other data you could choose combine with them.
> But you can't modify it, data should be modifiable. For example, you cannot change +
for -
You cannot change 1
for 300.
What exactly are we talking about when we say "modifiable"? You say yourself that 1
itself cannot be modified to be 300
. So, I can only guess you mean something on the lines of: "you can replace individual values within a collection of units"???? In which case, what is the difference between these two assocs? :
(def my-nums (assoc coll :my-num some-num))
(def my-fns (assoc coll :my-fn some-fn))
If that's not what you mean by "modifiable", then what? I am honestly thoroughly confused by your usage here.
> (def inc (quote (partial + 1)))
> Now you've got data 😜
> ((eval (concat inc [300])) 100)
> ;;=> 401
Or alternatively
(def inc (partial + 1))
((concat inc [300]) 100)
In either case you are "modifying" inc
in essentially the same way, by combining it with another function and some numbers. In one case you "modify" it by combining symbols and then reifying the entirety of the modification. In the other case you "modify" it just-in-time by automatically evaluating a combination of reified values. Yes, there can be value offered by choosing to work at that symbolic level, but the "modification" is in essence the same, and can in reality be done either way.
> and that's data that describes computations, such as are quoted Lisp forms. ...
Your argument that functions (except in quoted form apparently?) cannot be data is exclusionary. My argument that functions can be data, if treated as such, is inclusionary. In no way does this exclude instructions-about-instructions from also being data.
I am also not saying that functions must always be treated as data, just that they cannot be empirically excluded. "Categories" are generally useful even if "categorical thinking" is limiting. As such, there is value in trying to categorize functions as a thing somehow different from, for example, numbers... until the point where that suddenly becomes a cost instead.
As an obvious and stark example of such a cost: what if someone learns all about Clojure's version of "code as data" and then finds themselves trapped in a job without a lisp language. Should they just throw out the whole concept because they don't have syntax quoting? ...maybe turn back to OOP because apparently that's the only option left to them?
Most programming languages don't provide syntax quoting, or anything that much resembles it. If you want to make a plan to call some functions, you must make those functions and put them in the plan. So where is the middle ground where we can meet with people who don't know Clojure?... or don't currently have Clojure as an option? How can we show them there is a better way, starting from where are? ...without trying to convince them they must learn Clojure to even start down the path? ...without excluding people from "code as data" because their code maybe doesn't look obviously like data or their environment doesn't offer symbolic manipulation at runtime?> As the hashCode
of the java.io.Date
is different it means that the class has changed between executions. @U028ART884X
Interesting point. But that seems like an implementation detail to me. What do you honestly expect when you say "the whole jungle"? Should we include all of java.io.Date? the whole JVM? maybe copy over the system kernel and the registry numbers to recreate the exact memory blocks in exactly the same spot on the new machine (🤞:skin-tone-4:hope those blocks aren't in use already) ?
Seems to me like a reasonable, practical choice that they assume things provided by the JVM will still exist in the next JVM. Therefore seems "good enough" that you still get a valid inst at the same instant when switching between major versions of Java.
> To serialize a function, as-in, the function itself, and not just like an instance of one, like the data in it's closure along with it's qualified name, you'd have to somehow serialize the code for it. @U0K064KQV
Not sure what you mean here. (partial + 1 2 3)
is the function partial
closed over +
, 1
, 2
, 3
. In the example above, that was anonymously serialized, passed to a new JVM instance and called with the value 4
to return the expected value of 10
.
If you want a name too, a var is its own entity. You would need to proactively include it. So, I guess serialize a map using the var name as a key and the function as a value? or maybe serialize the whole namespace?
> By talking in terms of data and code and not types and code we have isolated ourselves on a different semantic island. @U0DJ4T5U1 Some random thoughts linking types to data.: Types are essentially enforced metadata? People want types because they don't trust their data. People want types because they don't trust their own code (which could be considered a type of data). > change the narrative around data so that it's connected with the conversation about types Is that really a great idea? I think part of the point is that types mostly solve the wrong problems. Shouldn't we instead focus on what problems really need the most attention, and how to solve them / how Clojure can help to solve them?
I think we're not really going anywhere haha. I'm okay to agree to disagree, but let me see if there's not a way to reconcile our views first. I feel you really want to say: Higher Order Functions are really cool, and powerful, and you can do a lot of stuff with them, you can pass them as input and return them as output, you can compose them, etc. Yes, they have a lot of good properties, and qualities, and are quite powerful. In fact, you can encode anything with just that, see the Lambda Calculus. Then I feel you imply that "data" is like the ultimate in power and expressivity. So since you feel HOF are very powerful and flexible, they're really close to "data" in that way, and so why not include them in the category of "data"-like things. Is that correct? I agree with all this. I love HOF, they're immensely powerful, and can be used to great effects, and you can see some things done with macros are similarly done with HOF, and given a bit of syntax sugar, even have an almost identical ergonomic in their usage. And if you want to make "data" the category of powerful and expressive constructs to express computation with, sure, they can be included. I don't really care as long as we can have a common definition so we can talk about things and understand each other. But if you're trying to tell me that HOF, have the same properties and qualities as a string of characters in UTF-8 encoding, that's where I disagree. So what would you call the category of things that have properties and qualities similar to a string of characters in UTF-8 encoding?
> And if you want to make "data" the category of powerful and expressive constructs to express computation with, sure, they can be included. No. That sounds like a pretty 😎 category. But not a particularly apt description of what I mean. But that's fine. I think everyone has had a chance to offer interesting commentary on the state of the universe. We don't need to run in circles. 🫠
Actually, maybe if I steal from the Rich Hickey convo I linked to, a Function is more like data encapsulated along with an interpreter. If you refer to the dictionary, this would actually now categorize it as an "idea". > datum/data: the object of knowledge as presented to the mind > ideatum/idea: the object of knowledge as known by the mind Thus I would say when code is as data, the code is as it is when presented to the interpreter. When code is as function, the code is as it is known by the interpreter. The form has changed, the latter has hidden it behind the interpreter, made the data opaque, but the idea as interpreted from it took form.
> As Joe Armstrong says, such code is easier to reuse. If you want to reuse (or test) a functional banana, you don’t have to set up a stateful gorilla to hold the banana first.
https://www.johndcook.com/blog/2011/07/19/you-wanted-banana/
So probably we shouldn't talk about data but we should talk about pure data. Because yes, my date function is a serialized object and it is data but it still needs an implementation of java.util.Date
.
If you happen to not be inside of JVM then you can't use your serialized version until you implement or mock JVM or a full JVM. That isn't practical. This kind of data is not self-contained. It's still data, but it's not pure and it's not self-contained.
Hum, ok, maybe I'm the pedantic one. But a java.util.Date instance is not data either. That's an object that encapsulates data, and this would make it a data type! The combination of data and a set of operations on said data! But this distinction matters less, because data types are just convenient containers to help you work with data. The challenge though, is when you serialize the java.util.Date, you're not necessarily serializing the data in it, but the instance of that object itself, and that is brittle, because again, Objects are not data, and so they are very tricky to serialize in and of itself.
And to make people's brain explode even more, you could challenge me and say that therefore Quoted Lisp Forms are not data either, but data type! Since it is a List data type and other such thing. And yes you'd be correct. But what's cool about Lisps is that those specific data-type are homoiconic with the data representation of themselves, meaning their representation as data is identical to their code! How cool is that!
> This kind of data is not self-contained. It's still data, but it's not pure and it's not self-contained. Excellent point. But is any data ever really totally self-contained? What is a list of numbers without way to also know it describes, say, the last 50 years worth of temperature readings recorded once every 4 minutes at a specific weather station in northern Colorado?
> The defining aspect of data is that it reflects a recording of some facts/observations of the universe at some point in time (this is what 'data' means, and meant long before programmers existed and started applying it to any random updatable bits they put on disk). A second critical aspect of data is that it doesn't and can't do anything, i.e. have effects. A third aspect is that it does not change. > Nothing about the idea of 'data' implies a lack of formatting/labeling/use of common language to convey the facts/observations, in fact it requires it > But equating any such labeling with more general interpretation is a mistake. For instance, putting facts behind a dynamic interpreter (one that could answer the same question differently at different times, mix facts with opinions/derivations or have effects) certainly exceeds (and breaks) the idea of data. Which is precisely why we need the idea of data, so we can differentiate and talk about when that is and is not happening - am I dealing with facts, an immutable observation of the past ("the king is dead") or just a temporary (derived) opinions ("there may be a revolt") -- Rich Hickey
We've seen that before. Could you please distill your point in regards to that quote?
That's precisely why Monads are not data, literally, when executed at a later time, it will run side-effect, it'll answer the same question differently, in your test, returning only a chain of functions, but when running actually causing side-effects and IO and all kind of things to happen.
Also, that means based on Rich's (and mine), we'd both would say that data without some known encoding is not really data, that's just random sqibbles.
And also that data is always pure 😛 So the distinction of pure and impure data is not really, well, data should imply purity, since it is just a fact, data represents a truth.
If it fails to do so, like in the example of java.util.Date, it because you didn't have data, otherwise it would have had the properties you'd expect from data
> I think everyone has had a chance to offer interesting commentary on the state of the universe. Just wanted to say big thumbs up on that. It's all very interesting to me, and enlightening as well to discuss these things.
Hmm... I don't see anything about an "encoding" requirement in that quote. But if you want to stick to that statement, "Reification of a function is just encoding the instructions for a computation into bytecode."
I do need to add one nuance to what I quoted from Rick Hickey. Personally, I don't mind if the human interpretation of the idea of data has changed from its historical one, so if what he's talking about is not what people think of as data anymore, that's fine. But still the idea of the thing that Rick is talking about is still a thing, and maybe we don't have a good name for it, but it does not mean it's not a thing that might be of interest to discuss, use, leverage, etc.
> Nothing about the idea of 'data' implies a lack of formatting/labeling/use of common language to convey the facts/observations I consider an encoding a "use of common language".
I think that if I would want to use a serialized function that needs some kind of JVM dependency I would have a very bad time (Edit: outside of JVM).
So based on this I have to agree with Rich on his point that data must be a fact and it must not change.
If we use JVM's internals to represent part of our data and those internals change then our data changes (who could've known that there will be a version of Java that will change how Date works and now hashCode
has changed...). And if it changes then we can't reliably reason about it. So if it "does something" then we can't reason about it.
But if it's a pure function serialized into JVM that always returns a deterministic output in the same order (same hashCode and maybe even same execution time) then on that point it could be regarded as data. MAYBE.
Edit: Also a serialized function could be regarded as data regardless of its contents if it doesn't get deserialized or otherwise executed. A.k.a. hash or a stored chunk of... randomness. Then it's data.
> But if it's a pure function serialized into JVM that always returns a deterministic output in the same order (same hashCode and maybe even same execution time) then on that point it could be regarded as data. MAYBE. Hum, I guess in Haskell this would be the case, as the functions would all be pure. I think you could reason as to what they do each time would be the same. But how do you know what it does in the first place? If all you have is the function instance? And how do you know the interpretation Haskell will apply to it when you invoke it? Such as what data it had closed over?
#1| (def a (partial + 1))
#2| a
#3| ;;=> #object[clojure.core$partial$fn__5922 0x17e66aed "clojure.core$partial$fn__5922@17e66aed"]
So in line 1 we see the data for the function, the function is represented using a string, as data, in a language we all know, which is Clojure. But on line 3, we see the function itself, now if you only had line 3? If I didn't also show you the data that was used to generate the function? Can you reason about it?I don't think my demonstration that you can serialize functions was ever that you should serialize functions. Just that serializability is an arbitrary restriction. Practicality says you should really offer more respect to the boundaries between environments.
The serialization has an important drawback that is very relevant here
It's not arbitrary though, its because I find it useful to have a concept of something that is like Rich Hickey says, a fact/observation that is just that, passive, not effect-full, static, which represents something of meaning that I can understand as long as I share the common language in which it was encoded.
In theory you can disassemble the function, and retrieve some semblance of it, or even, if you are really smart, you might be able to look at the binary for it and understand it, that binary is data, it will later be presented to the CPU which will interpret it into a sequence of instructions to execute. So like, ya if we go there, I can admit it's still "data". But form a practicality stand point, to me, I am too dumb to understand it, its become really opaque, sure I could edit the binary maybe use some MEMORY disassembler (I've actually done this in the past to cheat in games :p), but its so far removed, that in practice, its like no longer really data I can practically do anything with that I'd otherwise be able to do with data.
> a fact/observation that is just that, passive, not effect-full, static, which represents something of meaning that I can understand as long as I share the common language in which it was encoded
As an inverse tangent (bad pun about pointing us back in the general direction of the opening article):
If you remove a few choice monads from the language (mainly the IO
monad and it's do
notation sugar variant ), Haskell code is more like mathematical proofs than "code" in the traditional sense.
When you say a = b
in Haskell, that is not an assignment statement. It is an equality declaration, which the compiler will "verify as a true". If that equality doesn't hold, the compiler will throw (at compile time) like your math teacher writing a big red X on your homework. The compiled result of such code can hardly be described as active, effectful, dynamic, or failing to represent something meaningful if sharing a common language.
At its face, this is quite a different paradigm (although I get the impression most Haskell programmers lean heavily on do
, thus mostly nullifying this point).
In that case though, are we talking about the code itself? Haskell's code is data, and like mathematical notation, it has enough metadata and invariants around it's notation that you can reason about how it would behave if executed, without executing it. The Haskell compiler does reason about it in terms of data, it just parses it and tokenize it and goes on analyzing it and all that, never running it. I guess I'm some ways this shows the power of code as data, but Haskell doesn't directly expose that to the user, though it does let you use it to validate a lot of invariants by adding more or less metadata to the code as you write it. This is a good take though, I think talking about how Haskell uses the semantics data of code in very powerful ways. The analysis power of the code as data Haskell has is much better than that if Clojure, Clojure on the other does let you in theory do your own analysis and gives you a chance to do some edits before the code gets executed.
Well my knowledge of Haskell is pretty shallow. But what catches my eye from my example is that you cannot "run" an equality. You can run an equality check, or test, or verification. But the equality itself just is. It is a fact. And the declaration here is simply a recording of that fact.
Well, the equality is still assignment though, it just doesn't assign the return value, but the code block itself, and that only make sense, as in, doesn't cause possible errors, if the code block is pure. So like:
a = 2 + 2
print a
You're saying a
will be equal to 2 + 2
, and not that it will be equal to 4
.
Which means that print a
if you substitute a for its assigned value becomes print 2 + 2
and not print 4
.
At least that's the semantics, pretty sure when it runs it actually assigns 4 to a, but since it validated that the code assigned to a was pure, it will be equivalent and will behave properly.
So with those semantics if you did:
a = rand-int()
print a
print a
This would not make sense, because the first and second print would print something different, because it substituted to:
a = rand-int()
print rand-int()
print rand-int()
And so the compiler will error here, and won't allow it, unless you switch to <-
which assigns the return value instead. But that also means, if print is typed to take an int, this is a type error now, because rand-int has type IO Int, so what it returns is not an Int, but an IO Int.
Interestingly enough, Clojure macros have the same problem of "double evaluation".
Ya, I'd say this is code as data here in a sense, the equality lets you copy code as if it was data to other places, even if the compiler only lets you do it when the code and its return value can be interchanged for one another without changing the behavior.If you need to waste some time. Here is a fascinating rabbit-hole of arguments about what the definition of "is" is ... I mean, what the definition of "data" is. https://wiki.c2.com/?DataAndCodeAreTheSameThing
I like the definition that data is a something that can be interpreted to have meaning. This matches very well with how I categorize data. It's why the code in textual form is data, but why an instance of IFn isn't. The in-memory bits of the instance of IFn can be interpreted by the CPU, if you could get the bytes for them, that be data again, albeit quite difficult for a person to interpret the meaning of. And those bytes can be fully modified just as data can, changing the data. But through the IFn interface, you're not working with that data itself, you're working around it, you might tell the CPU to go and interpret the bytestream which represents the function to interpret by the CPU, but you're not yourself working with that data anymore.
Well, or to be more fair, you're working with it in that you've wrapped it in an envelope and are moving that envelope around, wrapping it inside of other envelopes alongside other pieces of data, etc. But you're not reaching into the data in a way that you can see the data or change it.
What's meta is that, you're actually not doing that yourself, but also using data to tell the computer to do that on your behalf, and so that textual core you use to call the IFn and wrap it and all that is also data.
And this might be the best way to understand how I categorize data. The instructions that you write as code, to instruct to computer to do things for you. Some languages and runtime will not allow you the same of permissions at all time in what you can instruct the computer to do or not. In a Lisp, you'll be able to tell the computer to modify the code itself, in Haskell you'll only be able to tell it to wrap it and move it and execute it multiple times, you'll no longer be allowed to change or visualize the code in question, aka, the data for the function that the computer would later execute. To me this is a big distinction. The data that describes the function to be executed when in IFn form is no longer something you are allowed to see or modify.