Fork me on GitHub
#off-topic
<
2020-07-22
>
Michael J Dorian14:07:11

My first non-trivial contribution to a Clojure tool that I use every day got merged this morning! https://github.com/weavejester/cljfmt/commit/edee90c060ab8ffd92f8ad04610d56d34aac4bb7

🎉 48
vemv21:07:44

Congratulations, very useful one! Personally I am convinced that 100% of hashmaps should be newline-delimited (and 80% of them should be vertically aligned... that's another story)

Michael J Dorian21:07:26

Thank you! I certainly like to have them newline delimited for my own code.

dominicm06:07:14

Newline delimited gets frustrating for things like arglists or let, imo.

vemv12:07:17

Funny, I think the same but for vertical alignment only

dominicm12:07:52

Vertical alignment is always frustrating.

noisesmith21:07:47

I wonder if this is a java bug

(ins)user=> (->> (io/file ".") (.listFiles) (filter #(= (str %) "./foo")))
(#object[java.io.File 0x1e6dad8 "./foo"])
(ins)user=> (->> (io/file ".") (.listFiles) (filter #(= (str %) "./foo")) (run! #(.delete %)))
nil
(ins)user=> (->> (io/file ".") (.listFiles) (filter #(= (str %) "./foo")))
()
(ins)user=> (->> (io/file ".") (.listFiles) (filter (comp #(contains? % (char 65533)) set seq str)))
(#object[java.io.File 0x6a9950f1 "./�Ϲ}^"] #object[java.io.File 0x7ad54c55 "./���JY"])
(ins)user=> (->> (io/file ".") (.listFiles) (filter (comp #(contains? % (char 65533)) set seq str)) (run! #(.delete %)))
nil
(ins)user=> (->> (io/file ".") (.listFiles) (filter (comp #(contains? % (char 65533)) set seq str)))
(#object[java.io.File 0xde18f63 "./�Ϲ}^"] #object[java.io.File 0x108bdbd8 "./���JY"])

noisesmith21:07:10

the sakura terminal emulator creates weird files in my home directory, containing weird characters (and the content is some config), I can't get a shell glob to expand to the file name

noisesmith21:07:40

java can find and delete other files, but there's something (at the fs layer?) that silently breaks on this input

noisesmith21:07:07

I was able to delete these files in the past via find with a clever selection predicate, but I'm more interested in the strange phenomenon that I can create a File object from the name / path of a file, but that can't be used to correctly do fs operations on that file

jsn21:07:57

Perhaps it converts the non-utf-valid parts of the filename to unicode "invalid character" on .listFiles?

jsn21:07:20

Does e.g. existence check on that name works, for example?

noisesmith21:07:09

the bytes as UTF-8

(ins)(user=> (->> (io/file ".") (.listFiles) (filter (comp #(contains? % (char 65533)) set seq str)) (map #(seq (.getBytes (str %) "UTF-8"))))
((46 47 -17 -65 -67 -49 -71 125 94) (46 47 -17 -65 -67 -17 -65 -67 -17 -65 -67 74 89))

jsn21:07:34

Well, I can't say if it's a valid "invalid char" unicode or an invalid unicode just by looking at it, so 🙂

noisesmith21:07:53

you can recreate the exact string using those bytes

noisesmith21:07:32

user=> (map #(String. (byte-array %) "UTF-8") *1)
("./�Ϲ}^" "./���JY")

jsn21:07:05

Yeah, that's not the problem; I don't remember any specifics about how java represents unicode and about encodings for invalid char seqs

jsn21:07:09

So, does .exists work + returns true on one of those http://java.io.File instances?

noisesmith21:07:52

@smith.adriane

(ins)user=> (Paths/get (first names) (into-array String []))
#object[sun.nio.fs.UnixPath 0x5136207f "./�Ϲ}^"]
(ins)user=> (Files/delete *1)
Execution error (NoSuchFileException) at sun.nio.fs.UnixException/translateToIOException (UnixException.java:92).
./�Ϲ}^

🙁 3
phronmophobic21:07:35

ie. instead of creating a path from the name. get the path by calling list on the parent directory of the problematic file?

noisesmith21:07:54

I will try that

jsn21:07:31

^ well, that somewhat confirms my suspicion that the file name is damaged on the way in

noisesmith21:07:44

could be a linux filesystem bug

noisesmith21:07:55

or corner case the fs never intended to handle'=

jsn21:07:57

because it's unrepresentable as utf-8

hiredman21:07:35

not all sequences of bytes are valid utf8 or utf16

jsn21:07:58

to be clear, here's my hypothesis: 1) filenames are not valid unicode, 2) java tries to represent them as valid unicode on dir read, and ends up having "invalid char" in filenames instead

hiredman21:07:20

the filesystem likely doesn't actually care and just stores file names as bytes, not caring about encoding

jsn21:07:29

so it's rather a java bug than an fs bug

noisesmith21:07:32

yeah, I think sakura overflowed a buffer and created a write-only filename that can't be used for lookup - the thing that actually lets me delete this file is running find, selecting the file based on a predicate, adn deleting, and find is smart enough not to do a file -> name -> file translation

hiredman21:07:33

but as soon as java tries to represent them as a string

jsn21:07:44

@hiredman no, you're wrong, fs cares

jsn21:07:21

but not by prohibiting such names, or at least not always; details vary for different FSes

hiredman21:07:22

pardon me, I doubt any linux filesystems care

jsn21:07:27

@noisesmith can't you somehow make a File from a binary string?

noisesmith21:07:45

I could try, checking the API

hiredman21:07:57

so there is a mismatch between file names as bytes and filenames as strings of utf8/utf16 characters

jsn21:07:59

@hiredman see e.g. mount options, there are options for filename encodings

hiredman21:07:25

I have only seen those options used for windows and mac filesystems (fat of various flavors and whatever the mac one was that started with h)

jsn21:07:20

@hiredman that's actually a fair point; I have no idea, could be that e.g. ext[234] fs just mandates utf8 or something, indeed. or even not care, actually -- I don't know

dpsutton21:07:25

Here’s the thing, though: on Unixes, paths are fundamentally bytes. The arguments and return types of the standard Posix OS interfaces open(2) and opendir(2) use C char* strings (because we still live in 1969).

This means that your operating system can, and does, lie about its filesystem encoding. As we discovered in the early days of beets, Linuxes everywhere often say their filesystem encoding is one thing and then give you bytes in a completely different encoding. The OS makes no attempt to avoid handing arbitrary bytes to applications. If you just call fn.decode(sys.getfilesystemencoding()) in attempt to make turn your paths into Unicode text, Python will crash sometimes.

💯 3
dpsutton21:07:45

https://beets.io/blog/paths.html not sure if they are authoritative

hiredman21:07:21

https://bugs.java.com/bugdatabase/view_bug.do?bug_id=4899439 is better, but all the filesystem bugs are marked as duplicates of that one bug

noisesmith21:07:56

that looks like precisely what I'm seeing, so it's a weird linux behavior combined with java not being messy enough to accommodate

noisesmith21:07:40

it would be funny to make a library on top of Unsafe for the sole purpose of accessing impossible files

hiredman21:07:51

did they end up removing unsafe?

phronmophobic21:07:22

you could probably use java native access (JNA) to directly call the filesystem apis

phronmophobic21:07:18

I've used https://github.com/Chouser/clojure-jna/ for a few projects and it works well