Fork me on GitHub
#clojure
<
2024-02-29
>
frozenlock15:02:48

I have a relatively big collection of maps fetched from a webserver using transit. Is there a way I can start to process it before it has completely downloaded? (With or without transit)

henrik15:02:49

Provided you don’t need all maps to start, and provided you’re streaming them, and not sending them as one blob; yes.

frozenlock15:02:23

Right. Currently it's all coming back as a single http response. I'm wondering if/how I can stream it using the existing setup, or if I should try something with websockets instead.

henrik15:02:25

Not sure it matters logically. You could do multiple POSTs as well as multiple messages on WS.

ghadi15:02:25

single http response isn't the salient part

ghadi15:02:44

it's whether the response is maps wrapped by a collection, or if it's a series of maps

ghadi15:02:03

(series of bare maps)

ghadi15:02:16

e.g. like JSONL

frozenlock15:02:51

Assume I have the best structure possible; I'll go from there 😉

ghadi15:02:18

{} {} {} is streamable by calling transit/read three times

ghadi15:02:29

[{} {} {}] is not

p-himik16:02:30

[{} {} {}] is also technically streamable, but you'd have to make changes to the Transit library.

ghadi16:02:30

right, it's about having a streaming parser

ghadi16:02:07

which lowers the abstraction level to event/token level, which we can probably all agree is unpleasant.

frozenlock16:02:24

If I understand correctly, I need to use the ByteInputStream from the http response (or something similar) and parse it... but how can it be done in parts?

frozenlock16:02:23

As in, how can I parse each item individually to get a lazy coll, or to put them on an async channel one at a time?

p-himik16:02:41

The input data has to have separate top-level objects on separate lines. You read the input buffer till you get to a new line, feed that buffer to Transit, repeat. And wrap it all in a lazy coll if you need. Alternatively, you can use a multipart response. It's probably supported by your HTTP client, you just have to tell it that each part must be decoded with Transit.

chrisn18:02:56

IF it is a series if maps then tmdjs and tmd's transit encoding pathway is going to be a lot faster.

chrisn18:02:46

In terms of encoding time, decoding and overall size although with gzip compression the win from size is decreased. That is specifically the pathway they are designed for.

chrisn18:02:41

Likely fast enough that decoding each map isn't going to help you at all and will overall be much slower.

frozenlock14:03:39

@UDRJMEFSN Currently it's only between clojure (jvm) instances, so I don't thinnk tmdjs would apply. My maps are also nested, so I'm not sure if tech.v3.dataset could be used. But it has popped a few times when I was looking on ways to increase the processing speed, so I'll keep an eye on it for sure. @U2FRKM4TW thanks, I'll look into the multipart one!

zimablue17:02:48

is (eval) slower than not using eval? Coming from cljs, I am unfamiliar with details of clj compilation. I remember the saying "all clojure code is compiled", I think that means in the sense that it is "compiled" into the jvm runtime, but are there optimizations happening somewhere which (eval) disallows/circumvents?

zimablue17:02:22

thanks, that was my guess but it's counterintuitive

hiredman17:02:39

Code is evaluated by compilation (compiled then executed)

seancorfield17:02:55

Using eval should still generally be considered a "code smell" -- it is very rarely needed.

hiredman17:02:09

When you load code from a file it is loaded by reading a form at a time from the file and calling eval on it(it doesn't actually call clojure.core/eval but it calls the same static method on the compiler that clojure.core/eval calls)

chrisn18:02:35

A better approach is to use requiring-resolve. Then if clients of your system don't want the clojure compiler booted up they can require the extensions or whatever at a high level (like in their main function) and then requiring-resolve is just resolve of an already compiled namespace.

chrisn18:02:13

Compared to that yes, eval is slower and is not optimizable. Requiring-resolve is at least optimizable.

zimablue18:02:04

this contradicts the previous statements that no optimization is happening either way

hiredman18:02:40

they are doing different things

hiredman18:02:52

once code is evaled the performance is the same either way

zimablue18:02:37

it's the loading of the initial code that is being sped up in crisn's suggested approach?

hiredman18:02:30

e.g. if you eval (fn [a] (* a a)) and call it, the function object returned by eval will have the same peformance as any other function object

hiredman18:02:19

but actually evaling (fn [a] (* a a)) to produce the function object has some cost too

hiredman18:02:15

@UDRJMEFSN is assuming you are using eval to approach a particular problem and is proposing a particular approach that can solve that problem using less work than eval does, but I don't know what problem you are trying to solve with eval

henrik07:03:03

If it’s about exposing scripting capabilities, https://github.com/babashka/sci will give you an effectively sandboxed environment to run code. Comes at a performance price.

igrishaev19:02:08

That's the end of the day for me, and I've got stuck with this Java expression:

private System.Logger.Level logLevel = ;
How can I reach the INFO static field in Clojure?

arnaud_bos19:02:50

I've never tried to reach into multiple levels of Java nesting in Clojure, but I think this will help you : https://ericnormand.me/article/tricks-for-java-interop

arnaud_bos19:02:26

INFO is a value in the inner Level enum inside the inner Logger interface of the System class.

hiredman19:02:34

it will depend

hiredman19:02:17

each of those dots in the java could be a field, a static field, or an inner class, and you need to know what each is to translate to clojure

arnaud_bos19:02:20

Maybe you could :import the enum itself directly.

igrishaev19:02:40

(import 'System$Logger$Level)
doesn't work

jgomez19:02:02

Did you try System.Logger.Level/INFO ? Or just Level/INFO if you imported it

hiredman19:02:02

you'll need to figure out where System is coming from

arnaud_bos19:02:23

System is java.lang

hiredman19:02:53

System is java.lang.System in clojure, but you need to know what System is in that java expression

🏁 1
hiredman19:02:00

and it is not java.lang.System

igrishaev19:02:07

ah!

(import 'java.lang.System$Logger$Level)
works. Thank you!

🎉 3
arnaud_bos19:02:24

public final class System {
    public static interface Logger {
        public static enum Level {
            INFO
        }
    }
}

hiredman19:02:47

whoops, pardon me, java.lang.System grew a System.Logger at some point

arnaud_bos19:02:50

JDK 9 apparently

arnaud_bos19:02:05

Java logging 🥵

Nundrum19:02:56

gratitude wow thanks for all that. It demystified several things for me. Which means I'll be back with more questions later 😆