off-topic 2020-12-16 | Slack Archive

cdpjenkins15:12:00

Someone on Hacker News has asked for advice about learning Lisp: https://news.ycombinator.com/item?id=25441664#25443184 … can you believe that the majority of respondents are not recommending Clojure? 😱

Ben Sless22:12:59

There is a certain snobbery among CLers which makes them look down their noses on their cousins. They also have a reputation for generally not being nice

☝️ 3

simongray12:12:54

Yup. Try saying something nice about Clojure on /r/lisp and you will either be told that Clojure is not a Lisp or that Common Lisp can do everything Clojure does and more.

cdpjenkins15:12:13

Heh I did just learn from the Hacker News comments about a whole load of stuff that Common Lisp does that Clojure doesn't (mainly around interactive programming). Maybe one day, I'll find the energy to learn Common Lisp just so I can experience those things :-)

euccastro23:12:49

depending on their intended focus, to someone newly learning Lisp I may recommend Racket rather than Clojure, because it's way more beginner-friendly and has more learning resources (e.g. https://htdp.org/2020-8-1/Book/index.html)

👍 3

Joe15:12:21

I'd say depending on what they want to learn Lisp for there's an argument for suggesting SICP/Scheme. But less so if their goal is to use a lisp to build practical production software.

borkdude16:12:49

SICP is pretty easy to read if you know Clojure, so there's an argument for learning Clojure if you want to read SICP too ;)

cdpjenkins21:12:45

I really do need to read SICP again, now that I know one Lisp. Going through the book and typing stuff out in Python was alright back in the day but I bet Clojure will be better.

bherrmann16:12:18

Is there a way to convert a let binding into a map ?

let [ name "joe"
      greeting (str "hi " name) ]
-- convert to 
    { :name "joe", :greeting "hi joe" }
in particular, I like how you can reference name from the definition of greet.

Renato Alencar17:12:04

What are you trying to accomplish by removing the let binding?

Dane Filipczak17:12:16

you can directly reference name in the expression you’re binding to greeting. Lets are evaluated sequentially and symbols defined previously in the let are available in later expressions

souenzzo18:12:55

This:

(defmacro named [& vs]
  (let [ks (map keyword vs)]
    `~(into {} (map vector
                    ks vs))))
(defmacro qnamed [& vs]
  (let [ks (map (fn [n]
                  (keyword (str *ns*) (str n)))
                vs)]
    `~(into {} (map vector
                    ks vs))))
(let [a 42]
  {:named  (named a)
   :qnamed (qnamed a)})
=> #'user/named
=> #'user/qnamed
=> {:named {:a 42}, :qnamed {:user/a 42}}

noisesmith18:12:53

(cmd)user=> (defmacro locals [] (into {} (map (juxt keyword identity) (keys &env))))
#'user/locals
(ins)user=> (let [name "joe" greeting (str "hi " name)] (locals))
{:name "joe", :greeting "hi joe"}

noisesmith18:12:41

one gotcha with locals is it captures all let bindings, loop bindings, and fn args recursively - exactly what I want for debugging, a no go in app logic

noisesmith18:12:08

(also captures for bindings etc. etc. - all locals)

noisesmith18:12:53

oh wait, this isn't what you wanted at all and let already lets you do what you want, my bad

noisesmith18:12:00

(also this isn't at all off topic :D)

bherrmann21:12:47

@U051SS2EU Thanks, the locals macro does exactly what I wanted.

Eric Ihli19:12:51

I've got a computer sciency question that's a bit off-topic from Clojure. It's related to "variable-length encodings" and data structures. I'm in need of a very memory efficient data structure that maps integers to strings. I'm encoding integers as bytes such that the most significant bit signifies whether or not the following byte is part of the current integer. The next 7 bits are part of the integer.

0 -> 10 00 00 00
1 -> 10 00 00 01
2 -> 10 00 00 10
...
127 -> 11 11 11 11
128 -> 00 00 00 01  10 00 00 00
129 -> 00 00 00 01  10 00 00 01

These encoded integers represent identifiers for strings. The identifiers are sorted. I'm imagining the data structure like:

<header-byte-length-of-data>
<variable-length-encoded-integer-identifier-0>
<byte-length-of-following-characters>
<utf-8-encoded-characters> 
<variable-length-encoded-integer-identifier-1> 
<byte-length-of-following-characters>
<utf-8-encoded-characters>
...

The problem I can't solve is efficient indexing. "Find the string with identifier 12345678". They are ordered, so I could binary search. But they aren't fixed width, so a binary search could drop me smack dab in the middle of anything. Is this impossible or am I just not clever enough?

val_waeselynck20:12:37

> I'm encoding integers as bytes such that the most significant bit signifies whether or not the following byte is part of the current integer @UJP37GW2K this problem seems interesting but I really don't understand that sentence

val_waeselynck20:12:49

It's also not clear to me what you're after in "memory-efficient": fast access or low footprint?

Eric Ihli21:12:16

Low footprint. What I mean by that sentence is: To decode an integer from an array of bytes: Look at the first byte. If the left-most bit is a 1, then the integer is the next 7 bits. If the left-most bit is a 0, then the integer is the next 7 bits left-shifted 7 places and bit-or with the same logic on the next byte.

val_waeselynck21:12:35

I don't see a direct solution to your problem, but this might help: https://nlp.stanford.edu/IR-book/html/htmledition/index-compression-1.html

🙏 3

andy.fingerhut21:12:16

So a common answer for efficient indexing is to define a decent hash function on the values that are your keys (in this case I believe those are your variable-length encoded integers), and then store elements in hash buckets indexed by the hash function result.

andy.fingerhut21:12:26

You can use binary search on any kinds of keys that you can define a total order on. The total order on variable-length encoded integers could simply be normal < between the integer values, but there are other total orders you could use.

andy.fingerhut21:12:46

e.g. a different total order on such an encoding would be "shorter byte sequences are smaller than longer byte sequences, and if two byte sequences are the same length, then use lexicographic ordering between those equal length byte sequences". I am not sure there is any practical advantage to using that total order over the one above, though.

andy.fingerhut21:12:10

Hmm, rereading your description, I guess when you say "I'm imagining the data structure like:" you mean that the data structure would be a contiguous sequence of bytes in memory that starts with <header-byte-length-of-data> and continues as in your example?

👍 3

andy.fingerhut21:12:41

If there is no separate 'index' to this sequence of bytes, because you are trying to avoid the memory required to store such an index, then a question for you is: can you encode that sequence of bytes such that by starting at an arbitrary byte position in that sequence, you can scan forwards and/or backwards and determine where the start of the next record is, without having to scan from the beginning?

andy.fingerhut21:12:38

e.g. the simplest way to do that would be to have a unique byte value that could not be used within each 'record', but that probably isn't possible with your data.

andy.fingerhut22:12:07

If there is a way to find the next record boundary starting anywhere in the middle and scanning forward, you could do binary search on the position of a byte in that byte sequence, e.g. start at the middle byte, scan forward for the next record start. Compare the search key to the key of that record, stopping if you have a match. If your key is less than that one, take the half-way position, in bytes, between the current [start,end] range you've narrowed it down to, and scan forward to find the next beginning-of-record.

andy.fingerhut22:12:36

Whether that lookup is efficient enough for your purposes or not is up to your application.

andy.fingerhut22:12:50

When people care more about lookup efficiency and are willing to trade it off for some amount of extra storage, they create a separate index with keys and pointers to starts of records.

andy.fingerhut22:12:47

UTF-8 encoding has some similarities to the variable length integer encoding you mention, and has a "synchronization" property where from an arbitrary position within a sequence of bytes, you can scan forward (and maybe also backwards?) to find the next start of "variable-length Unicode code point": https://en.wikipedia.org/wiki/UTF-8. (look for occurrences of the word "synchronization" on that page)

Eric Ihli22:12:15

I thought briefly about whether I could use a flag bit to detect the start of a record. My instinct was that since UTF-8 uses a flag bit where the MSB is either 0 or 1 and the rest could be anything, that it kind of killed that possibility (unless I padded every byte and used the extra space for my own personal flags).

Eric Ihli22:12:54

And yeah, I was imagining a contiguous sequence of bytes. Got the overall idea from https://www.aclweb.org/anthology/W09-1505.pdf and just struggling with implementation. Just came across what I think will be a valuable resource. https://hdevalence.ca/blog/2014-02-13-fun-with-ngrams-part-ii-tightly-packed-tries

Timur Latypoff08:12:19

@UJP37GW2K if you're free to choose the memory layout, could I suggest you use not one but two continuous regions of memory? One region is just all your fixed-size structs (no variable-length ints) without the actual string data which you can easily binary search. Instead of storing the string data inside your struct, you store an array index where the string starts in your second region (which consists of all concatenated variable-length strings). With some extra effort, you could even do variable-length ints in the first region (if it's a major space saving), since you can more easily identify where the structs begin and end without the string data — for example, you could separate them with a single zero byte while ensuring no other bytes would be zero. Actually, while thinking about it, I think even your original data structure does not have zero bytes in it, so you could use a zero byte as a record separator for a random binary search indexing.

💯 3

❤️ 3

dpsutton19:12:08

There was a really good blog post about rewriting grep in Haskell. They had to face this problem when facing code points I think. They had a clever way of dealing with it if you want to see.

🙏 3

dpsutton19:12:45

Sorry it was wc not grep. https://chrispenner.ca/posts/wc

Lennart Buit19:12:29

(Slightly related: GNU’s yes is ridiculously fast because people appear to have been code golfing it for speed)

nate20:12:57

https://twitter.com/djrrb/status/1339244410387750920

😆 3

nate20:12:10

totally worth it

2020-12-16

Channels