Fork me on GitHub
#datalog
<
2022-03-30
>
fs4216:03:40

Changing keys in a (nested) dict? When you receive data as json, and you have to transform it into nice looking entries to feed into your datalog-db, you'll have to translate those json keys as strings into clj keywords. What is the right tool to use to make those transformations the least painful? I've been looking at (specter/setval (specter/keypath ...) but that doesn't seem to support nested key path. While (specter/setval [(specter/map-key ...) doesn't seem to support simple key substitution. Any suggestions/advise?

mauricio.szabo23:03:33

clojure.walk/postwalk should do the trick

mauricio.szabo23:03:50

But also, most JSON conversion libraries support key transformation too

fs4218:03:41

Thanks for the suggestions! I understand it's fairly easy to automatically convert all string-keys into equivalent keyword-keys. However, I'm looking at json docs where the individual keys/attribute-names are reused in different entities with different "meaning". My idea was to use namespacing of the keywords to distinguish the different semantics associated with their usages. This approach complicates the key-name conversion as it would make the substitution of the old key-string by a namespace-keyword depend on the path in the json-doc. I'm new to this data-modeling for datalog and feel that I could learn a lot from others who probably have gone through similar considerations and exercises... Any advise/suggestion to deal with those use cases much appreciated!

pithyless06:04:59

@UNRPUL2CT perhaps a different way of looking at this problem: it's not a problem of converting json strings to clojure keywords (which is what those JSON conversion libraries help solve), but a problem of interpreting nested data (context-specific names) to an unambiguous format, which then happens to be serialized to namespaced keys (context-free names). Thinking of that middle layer (how to go from ambiguous nested data to an unambiguous structure) and making it an explicit step in your program would help me with tackling this problem. You can write that middle translation layer yourself in Clojure, but you can also try to make it more declarative with some helper tools, e.g. data parsing libraries like malli, meander, clojure.spec etc.

pithyless06:04:27

And yet another approach: rather than trying to interpret and parse the data by hand, if your JSON docs are very irregular and/or complicated and you don't want to write all the rules to actually interpret them, you can just try to load the entire json files into asami (or perhaps even pyramid would suffice if the features are sufficient) and literally just query the data you want via datalog. You can even use that intermediate database to then go ahead and build a new database/structure by finding meaningful data via datalog queries and then putting everything in a new format/database that represents the data in the new way you'd prefer.

fs4216:04:11

Appreciate the advise! - will check-out the mentioned tools/libs. I like the idea of possibly importing the current docs as is, and then use datalog to query/introspect and possibly change/update/add some of the attribute names and associated schema such that they better reflect the underlying model. Didn't think of that option... (I hope I understood your suggestion well 😉 )

teodorlu23:04:56

I've tried "normalizing" the nested maps into a single layer. If you have {:x {:y {:z 123}}} You could translate that into {[:x :y :z] 123} , do whatever you want with the keys, then transform back into the nested representation. As other have mentioned - perhaps you could use less nesting. Maps from namespaced keys to values are easier to work with than deeply nested maps.

fs4215:04:27

@U3X7174KS I like that idea! Extracting the key-path to each value as a vector, then transforming some of the keys, and then putting the data structure back. That key vector is the same you'd use with get-in and friends. I can see how you'd "easily" reconstruct the map with reduce and assoc-in. Any easy way to obtain {[:x :y :z] 123} from {:x {:y {:z 123}}}?

teodorlu15:04:51

@UNRPUL2CT I don't know of any stdlib functions that fit out of the box. I think I'd reach for a recursive function. Walk down until you hit a value that's not a map. Collect your path as you go deeper. When you hit a non-map value, you "emit" path and value. clojure.zip might be useful here, but I've never used zippers seriously. Perhaps also clojure.walk. https://clojure.github.io/clojure/clojure.zip-api.html

teodorlu15:04:22

Hmm, perhaps this is just a zip. Zip has a next method. So you can just walk along. It also knows its path. So this might just be a while loop over https://clojure.github.io/clojure/clojure.zip-api.html#clojure.zip/next, collecting paths and values in an atom on the side.

teodorlu15:04:12

Yeah, I think Zippers would be a good fit. Ended up reading a bit here: https://grishaev.me/en/clojure-zippers/#part-2-automatic-navigation

teodorlu16:04:06

So .. you nedsniped me good there. Tried zippers, wasn't able to make it work. Zippers are more general, and there's no "map zipper" provided. The zipper "path" is a list of nodes required to get to a location.

teodorlu17:04:53

Now I've also asked on the Clojureverse Discourse: https://clojureverse.org/t/how-to-transform-nested-map-into-flat-sequence-of-path-value-pairs/8801 Curious if we'll get some good ideas.

fs4217:04:41

@U3X7174KS Sure appreciate your effort here!

🙌 1
teodorlu17:04:10

You asked some interesting questions 😄

👍 1
teodorlu18:04:09

I think I finally got you an answer 😄 For the normalize function, simply copy Oddsor's or Ed's solution. https://clojurians.slack.com/archives/C053AK3F9/p1649095314294669?thread_ts=1649090531.582539&amp;cid=C053AK3F9