Hi At work we get somewhat big (>600MB) json files which we need to process. Im able to load them with Chesire and slurp. But i would like to explore them a little bit are there some tools which could help me with that? (One idea was to load them somehow into a datomic database so that i can query it.)
Maybe only slightly related, but I have two blog posts about exploring data with UUIDs in Emacs so that when the point (cursor) is on a UUID, the JSON representation of the entity whose primary key is that UUID is shown in the echo area. My code is Node.js and Postgres-based, but can be adapted to other stacks. https://mbork.pl/2025-07-26_Finding_entities_with_given_uuids_in_the_current_project https://mbork.pl/2025-08-11_Using_Eldoc_to_show_entities_with_given_uuids_in_the_echo_area
Perhaps... dump JSON records into SQLite JSON column and use babashka to query the SQLite.
Quick duck-search showed this blog post: https://www.dbpro.app/blog/sqlite-json-virtual-columns-indexing
Thanks! Thats a great idea. I had in my mind that jsonb is much more limited in size
Maximum length of a string or BLOBPhysical storage limit if each file is a single 600 MB JSON object: https://sqlite.org/limits.html > The current implementation will only support a string or BLOB length up to 2,147,483,645 bytes
Thanks! I had a short look there in the beginning. But my documents have a lot of internal relations (one thing references an other via an uuid) so i have the feeling someldatabase is easier
Paula Gearson did a great presentation about auto schema generation from json for Datomic at the 2017 conj which I happened to watch a couple of days ago: https://www.youtube.com/watch?v=8jXEqvTnOTg
Cool
https://github.com/cnuernber/charred might come in handy. JSON parser optimized for performance, quite a bit faster than Cheshire for large JSON files.
I would recommend https://duckdb.org/ over sqlite in this instance
Can second DuckDB really useful even the command line tool
if you want to go the datalog route due to the internal relations, consider using https://github.com/datalevin/datalevin. the two big selling points for this use case are 1. runs locally with minimal setup 2. you don't have to define a schema up front like you do with Datomic (though it may be beneficial for a subeset of keys that express relations) https://github.com/quoll/asami is also a schemaless datalog DB and has even more advanced support for graph relations. Datalevin might have better performance.
We use fx for this, it can be great: https://fx.wtf/
If the shape is right (an array of objects with many keys in common, like a table) then visidata https://www.visidata.org/ can be helpful too.
Thanks
Turning Json into a graph DB is one of the selling propositions of Asami: https://github.com/quoll/asami There was even a talk "Asami: Turn your JSON into a Graph in 2 Lines": https://youtu.be/-XegX_K6w-o?si=oQsomgNemoZnTDzX
Thanks i will have a look 👍