Fork me on GitHub
#clojure
<
2015-07-12
>
max04:07:02

This might be more of a java then clojure issue but here goes. I’m getting a different sequence of bytes when I call (.getBytes (slurp “/path/to/somefile”)) on sun java 8 vs openjdk java 7. How is this possible? I assume it’s a unicode issue, but still...

max04:07:37

is slurp’s default behavior going to change in java versions (same version of clojure on both jvms)

timothypratley05:07:19

@max: does (java.nio.charset.Charset/defaultCharset) return the same thing in both?

max05:07:11

@timothypratley: ha no. UTF-8 vs US-ASCII

max05:07:11

so in practical terms, I found this bug because I was getting the wrong crc32 checksum of a string got from a POST request in a Ring app. This particular crc library uses .getBytes.

timothypratley06:07:17

You can specify a character encoding in the .getBytes call, or set the default encoding as a system property

jrychter08:07:59

@otfrom: I use CRFs for tagging parts of natural language phrases, as a module in our search engine, for matching products to cooking recipe ingredients. As I said, directly using the Java APIs works, but could be made nicer.

jrychter09:07:53

@max: whenever I see .getBytes in code, I think of a landmine — it's bound to explode someday. The only way to use it safely is with UTF-8 and then you have to be very sure that everything inside and around your app is UTF-8.

max09:07:40

@jrychter: yeah I ended up using a different crc32 library

max09:07:21

seemed safer simple_smile

jrychter09:07:10

@max: you might want to report this as a bug to the library maintainers — calling .getBytes without an encoding parameter is basically undefined behavior.

max09:07:21

@jrychter: just did, but it hasn’t been touched since 2012 so c’est la vie

akiva15:07:01

What is the general consensus on shadowing with let assignments? Rather than doing stuff like myvar and then myvar'.

pesterhazy15:07:36

I don't know about the consensus, but I try to avoid it

akiva15:07:43

Shadowing?

pesterhazy15:07:05

yes; it's confusing because even when your eyes catch the outer myvar binding, you can't be sure that there might be another inner myvar binding

pesterhazy15:07:54

myvar* is not pretty, but a bit less confusing

akiva15:07:28

Okay. I’ve always shadowed but a co-worker pointed out that he prefers the other way unless it’s immediately obvious so you can avoid stuff like myvar' and myvar''.

andrewmcveigh15:07:46

It depends on the use. I find I don’t need to shadow too often. Sometimes it makes sense, sometimes not.

andrewmcveigh15:07:02

I don’t think there’s a consensus either way

pesterhazy15:07:20

a related question is whether to shadow clojure.core/type, clojure.core/time etc.

akiva15:07:32

I find I do it often, actually. Let’s say I have a function that processes a string. I tend to just shadow the string in the processing.

andrewmcveigh15:07:25

@pesterhazy: Personally I’d say beware of that. Can lead to tricky bugs

akiva15:07:07

Not shadowing doesn’t bother me. A little added noise is worth it if it makes things clearer.

pesterhazy15:07:53

@andrewmcveigh: I agree, though it's tempting with common nouns like "type"

andrewmcveigh15:07:02

Well, the worst thing I find is constructing maps with keys like {:name “something” :type “error” …} and then trying to destructure.

pesterhazy15:07:23

yes I was just about to mention this

andrewmcveigh15:07:31

Then you call name, and you get some error like cannot call string

pesterhazy15:07:43

so it extends to keywords as well (if you want to use them in destructuring let's)

andrewmcveigh15:07:33

Sometimes the best thing to call something is ‘reserved'

pesterhazy15:07:39

in particular, if you refactor the compiler won't catch a mistake (if you rename "type" or "name")

andrewmcveigh15:07:46

Though I guess, the main ones for me are: type, name, meta,

andrewmcveigh15:07:59

I never shadow something like map, list, etc.

pesterhazy15:07:19

map is pretty sad as well 😞

akiva15:07:23

Oh, I’d never shadow an actual function name.

akiva15:07:02

I just meant shadowing a parameter. (defn x [s] (let [s (…)] (…))

akiva15:07:23

For me, type almost always becomes class or kind.

pesterhazy15:07:38

you end up having to choose single-letter abbreviations (m), a misspelling (klass) or punctuation (map*)

akiva15:07:41

Er, not class.

andrewmcveigh15:07:07

@akiva: Sure, but there’s always a time where the “best” name would shadow something, E.G., (defn operation-on-a-list [list]…)

andrewmcveigh15:07:09

Though in that case I guess there are unofficial idioms. x, coll, etc.

akiva15:07:11

category was what I was thinking of.

andrewmcveigh15:07:30

Well, if it’s still a category...

akiva15:07:49

Just trying to use something generally synonymous.

pesterhazy15:07:57

worst of all, now we also have "update" and "fold", more things to be careful with simple_smile

pesterhazy15:07:38

ah, not fold, sorry

spiralganglion16:07:40

Yeah, update really kills me - I write a lot of animation/game code where it's a super-common term.

spiralganglion16:07:41

Though I guess more as a function name than a binding name.

voxdolo16:07:42

For local assignment (and to a slightly lesser extent, destructuring), I find using a gensym fairly reasonable. Something like category# or type#.

voxdolo16:07:31

Visually distinct to my eye from the shadowed var and guaranteed not to collide.

pesterhazy17:07:51

voxdolo: is this better in any way than category* or category'?

pesterhazy17:07:14

err. type* or type'

andrewmcveigh17:07:00

@pesterhazy: what’s the difference between the binding category and category*?

voxdolo17:07:18

Pesterhazy: Since it's a reader literal for the gensym function, I'd say so: https://clojuredocs.org/clojure.core/gensym

pesterhazy17:07:28

@andrewmcveigh: I meant to say type* or type', because category is actually not in clojure.core

voxdolo17:07:05

I've only been working on clojure professionally for the last year, but the # at the end is something I subconsciously scan for and attach the additional meaning to of "this thing is meant to alias something else and not to collide with another meaningful variable".

voxdolo17:07:34

It's meant to be used in macros, but I think judicious use outside of them is also reasonable.

voxdolo17:07:00

YMMV ¯\(ツ)

andrewmcveigh17:07:06

So, if the :type is semantically a type, then name it ’type, or be more specific. I don’t think you need to get hung up on it.

andrewmcveigh17:07:10

:type# seems superfluous. Or don’t destructure keys like :type, :name, etc.

voxdolo17:07:33

gensym literals don't work for keywords :) thus why I said local assignment (ala let) and to a lesser extent destructuring.

andrewmcveigh17:07:48

But, part of the discussion was about keyword destructuring. Also applies to :type* and :type’.

andrewmcveigh17:07:06

Personally, I’d not bother with the type# gensym. You only find them rarely outside of macros.

andrewmcveigh17:07:07

If it’s still the same thing we’re talking about, no reason to rename it. If not, think of something else to call it.

andrewmcveigh17:07:44

Or use some combination of (-> …) and don’t bother naming it in the first place.

rwtnorton17:07:09

@voxdolo: That is an interesting use of foo#. Adding that to my utility belt!

akiva18:07:57

Whenever I see gensym, I assume I’m looking at a macro. I’d probably not want to see it outside of that, really.

stuarthalloway18:07:57

gensym comes in handy in internal DSLs, even when macros not involved