Fork me on GitHub
#data-science
<
2022-08-09
>
p-himik12:08:18

What's the right way of dealing with C strings returned from native functions when using dtype-next? Right now, I have this:

(dtype-ffi/define-library!
  clib
  '{:hello {:rettype  :string
            :argtypes [[name :string]]}}
  nil ;;no library symbols defined
  nil ;;no systematic error checking
  )

(dtype-ffi/library-singleton-set! clib "libhello.so")

(comment
  (let [s (str/join (repeat 1000000 "x"))]
    (dotimes [_ 10000000]
      (hello s))))
And it definitely creates memory leaks because, I assume, the returned value is never freeed after being used to create a Java string.

teodorlu14:08:42

I'm assuming Chris Nuernberger is intersted in this. I think he follows https://clojurians.zulipchat.com/ more closely than this Slack.

p-himik14:08:19

Tagging @UDRJMEFSN just in case then.

phronmophobic14:08:20

I usually use :pointer rather than :string, but I assume there's not enough info to determine who owns the memory for the c string and when it's ok to reclaim it

p-himik14:08:17

Maybe it could be an additional flag for an argument (something like :owned-ptr) or an extra type (like :own-string). Not sure.

phronmophobic14:08:24

that's possible, but the problem is that there's a ton of different memory management philosophies for c libraries in the wild.

phronmophobic14:08:22

I usually use dtype ffi as a low level library and build a higher level interface on top. It shouldn't be that hard to use cleaners as part of a higher level api to automatically handle the cleanup for you

p-himik14:08:10

Hmm. I see, thanks.

chrisn16:08:43

The string passed into the callsite as an argument definitely is being freed. The return value, as you note, is not being freed as the library can't determine if that is indeed the correct pathway to use here. It could be a static string or something allocated as part of a different structure.

chrisn16:08:20

I agree that this could be a separate type such as transient string or something like that but in general I recommend what @U7RJTCH6J does in that in the case where the library returns a new object that the caller now has ownership of then I return a :pointer or :pointer? and handle it in wrapping code.

chrisn16:08:43

Lots of times it is a structure or something that requires more involved shutdown than a free call.

p-himik17:08:54

Thanks! That seems to be working just fine, albeit slower than I'd like. Have to experiment with all the approaches, I guess.

chrisn17:08:02

What specifically is slower than you would like?

p-himik17:08:37

A single call that does barely anything takes around 5 ms. So adding there a call that also frees the data adds another 5 ms.

chrisn17:08:03

Hmm - that doesn't sound right- but the default string management is slow. Do you have an example project?

p-himik17:08:07

Sure, I'll create one within a day or two.

👍 1
chrisn17:08:10

Recently someone profiled the python bindings and was getting far better perf than that

p-himik20:08:25

Ah, pardon my false report - I just figured out I was moving around a 1 MB string. Cutting it to 1 kB also cut the calling time dramatically.

chrisn21:08:58

OK, well one thing is that calling malloc and free are comparatively more expensive than calling new on the JVM. If you had 1MB of data you would want to manually manage that and ideally control the conversion to a string in a controlled way or perhaps do it once and leave it as a pointer to reuse portions of - dtype helps you with all of this so you can keep part of your dataset in C land but still access it from Clojure. Great to hear it is working for you now.

👍 1