Fork me on GitHub
#clojure-europe
<
2021-03-17
>
ordnungswidrig07:03:27

@simongray wouldn't a zipper require the data all be in memory?

pez07:03:08

Good morning! Did you see I made a macro? 😃

👍 3
dharrigan07:03:53

macrotastic

❤️ 3
simongray07:03:58

@ordnungswidrig I don't think zippers as a concept would necessitate keeping the entire data structure in memory. They are basically pointers inside a tree data structure with some options for local navigation. If you make a custom tree data structure that doesn't realise the contents of its branches until read time, then it won't keep everything in memory.

simongray07:03:50

And here I'm talking about clojure.zip - I'm sure you can make a zipper that's more efficient than that implementation too.

simongray07:03:09

But obviously it all depends on the data structure. Not sure how lazy something like Clojure's hashmap would be in practice.

simongray07:03:34

If you're navigating a huge json file from a disk you will need some custom data structure magic.

simongray07:03:45

But the search algorithm itself would be easy to write: just check sibling nodes and follow paths down where you need to. This can be parallelised too for searches, but I don't think it would work for transformations - at least I wouldn't be sure how to join the edits from a bunch of zippers.

ordnungswidrig07:03:52

"Lazy stream zippers" sounds like a 60s progressive rock band ❤️

metal 9
slipset08:03:52

@simongray Don’t know too much about zippers, but some/most underlying json parsers eagerly parse the json into tokens. So while you could avoid creating all the maps/seqs, you still need to fit the whole json string/tokens in memory. What would be really cool would be a streaming/lazy tokenizer (which I believe is what @ordnungswidrig found yesterday)

slipset08:03:16

So imagine I had a 300GB json and I basically a wanted (get-in parsed-jason [0 :name]) , the trick is to get the name of the first object without parsing the whole json string.

simongray08:03:02

that’s then data structure magic I was referring too 😉

slipset08:03:14

With that, comes the question of validating the json as well. You cannot validate it without reading the whole thing.

slipset08:03:34

(which may or may not be a problem)

simongray08:03:14

but I guess you can interpret every level of the data structure you need as as {key_1 <pointer to val_1>, … key_n <pointer to val_n>}. Then you need a zipper function that can realise a pointer as a piece of data. I guess the pointer could just be the linecount/charcount boundary of each val.

simongray08:03:21

The point is just, with the right data structure, implementing a zipper for it is pretty straightforward and zippers can be paused, resumed, rewinded, and really made to go in any direction.

simongray08:03:56

So basically you read through the entire contents of a json object (as text), registering the keys and the boundaries of the vals (your pointers). Then you can zip into any one of those vals in separate threads if you like and simply repeat the algorithm for the boundary contained by the pointer, e.g. “line 4/char 7 to line 6/char 12”.

simongray08:03:40

I realise that I may just be cargoculting zippers since I really like using them 😛

simongray08:03:32

to me the fact that you write pausable tree navigation and transformation algorithms using such a simple tool is quite magic

slipset08:03:41

Yes, I think I see that, but I was more thinking about the problem of an infinite json-stream (A json-based Turing machine?) or a json-stream which was to big to hold in memory (or which stream was so slow that it didn’t make sense to read the whole thing to get the first element , (https://tools.ietf.org/html/rfc2549)))

thomas09:03:46

just thinking out loud here.... could shelling out to jq help here? it might do something clever (I don't know, but I guess easy to test)

dharrigan09:03:12

jq does allow the --stream option

ordnungswidrig09:03:50

This is why I think regex state machines are how this might be implemented. E.g. query like `give me the “order” object when any of the “orderitem/product/name” values contains “gizmo” could be “compiled” into a statemachine which collects order date until you can rule out the the pattern matches. This all happens on an event stream of json tokens: :map :key "orders" :list :map :key "id" :string "123" :key "orderitems :list :map :key "name" :string "Rumbling gizmo" :endmap :endlist :endmap :endlist :endmap

dharrigan09:03:50

never used it myself 'tho

ordnungswidrig09:03:01

@dharrigan that sounds like an interesting option

ordnungswidrig09:03:05

but only json, not edn 😛

borkdude09:03:17

PR to jet welcome for EDN :P

borkdude09:03:17

Another idea: don't store your data in JSON but XML and use tools that already worked in the 00s ;)

3
simongray09:03:47

@slipset BTW I starred this repo the other day: https://github.com/pangloss/fermor Haven’t looked that deep into it, but looks to me like it’s using some of the same buzzwords.

simongray09:03:40

seems to use a Java dependency, though-

ordnungswidrig10:03:03

@simongray I like this code comment from the examples:

;; This version is a very direct port of the above query and in a fermor system
;; would never pass code review. It has all of the guts of the query hanging
;; out. Instead we can trivially create a domain model and work at a much higher
;; level of abstraction.

ordnungswidrig10:03:45

hmmm one could also implement a jackson parser for EDN. 🙂 This should unlock JsonPath on EDN I guess. :thinking_face: Not sure if that’s a nifty idea though.

borkdude11:03:02

Maybe one can already use JsonPath on transit? :thinking_face:

ordnungswidrig15:03:11

@borkdude you mean translating jsonpath to a jsonpath expression that would match on transit?

ordnungswidrig15:03:22

Sounds super hard because of the statefulness of transit 🙂

pez21:03:10

Playing with writing a macro is quite amazing. I am actually manipulating Clojure code like the data it is. I’ve been telling people about that Clojure is homiconic and sort of understood what it means, but now I realize what it actually means. 😃

clojure-spin 12
😄 3