announcements

2025-10-19T12:55:25.681509Z

https://github.com/askonomm/dompa - A zero-dependency, runtime-agnostic HTML parser and builder. • Turns HTML strings into a tree of nodes, and makes it easy to traverse and modify said tree • Turns a tree of nodes back into HTML, and provides convenience utilities if you want to do so in a templating-like way • Built to be runtime-agnostic and should in theory work in all Clojure runtimes, though is currently tested and proven to work in Clojure, ClojureScript and Babashka, with Jank support hopefully coming soon.

🎉 12
henrik 2025-10-20T08:16:05.495829Z

Very nice Asko. Since the structure is uniform, a suggestion would be to add a zipper function (to get the zipper CRUD API for free). Perhaps implement traverse with zippers as well. I don’t think HTML is generally deep enough to be liable to blow up the stack, but using a zipper would guarantee that it can’t happen as opposed to recursion.

2025-10-20T08:22:54.186909Z

Thank you @henrik! I'm ashamed to say I have very little experience with zippers in Clojure, but quickly reading the docs it does seem like a very convenient way to navigate a tree structure. I've made a issue for it so I won't forget to give it a go here: https://github.com/askonomm/dompa/issues/9.

henrik 2025-10-20T08:41:42.624179Z

I don’t think you have to be ashamed of that, I think that lands you among the majority. I wouldn’t have known about them if someone hadn’t directed my attention to them, because you get so far with just the standard stuff in Clojure. They’re cool when you encounter data structures with a repeating pattern, since you can just write one “adapter” and then the actual API for manipulating the structure is the same, regardless. We have some zipper usage in our codebase. For example,

(defn- not-end?
  [loc]
  (not (zip/end? loc)))

(defn find-node-locations
  "Given zipper `loc` find node locations matching the `pred`. 
  
   Optionally takes a transducer, eg. `(map zip/node)`."
  ([loc pred]
   (sequence
     (comp
       (take-while not-end?)
       (filter #(pred (zip/node %))))
     (iterate zip/next loc)))
  ([loc xf pred]
   (sequence
     (comp
       (take-while not-end?)
       (filter #(pred (zip/node %)))
       xf)
     (iterate zip/next loc))))
If I were to write a zipper for Dompa, maybe something like:
(zip/zipper map?
  (fn get-children [node]
    (get node :node/children))
  (fn create-node [node children]
    (assoc node :node/children (vec children)))
  html-root)
The find-node-locations function would work, even though it isn’t written with Dompa in mind.

borkdude 2025-10-19T14:27:28.686599Z

Gave it a test drive in bb. hr elements (and other self-closing elts) can be written as:

<hr/>
or
<hr />
both crash in dompa now in different ways

borkdude 2025-10-19T14:28:27.416199Z

This also crashes with a ClassCastException:

(require '[dompa.html :as html]
         '[dompa.nodes :as nodes])
(prn (nodes/->html (html/->nodes (slurp ""))))

jyn 2025-10-19T14:32:31.676589Z

this is really neat

jyn 2025-10-19T14:32:37.626719Z

i might use this to replace Jsoup in flower

2025-10-19T14:33:05.040399Z

@borkdude oh man, that's not good. I'll get those fixed asap.

jyn 2025-10-19T14:33:13.822859Z

(Jsoup has an imperative API and this has a nice clojure-native traversal API for edits)

jyn 2025-10-19T14:34:01.630269Z

this doesn't seem to be on clojars currently, is that right?

2025-10-19T14:34:55.772409Z

That's right, just on GitHub. Do you rely on leiningen? I was wondering if/how much is Leiningen still a thing, so didn't go ahead with Clojars just yet, but I can get it up there in a bit.

jyn 2025-10-19T14:35:26.724969Z

I don't use leiningen, I can use git deps if clojars is a pain. mostly I wanted to look at the generated API docs.

2025-10-19T14:37:29.724409Z

Ah right, Clojars does that! Clojars isn't really a pain other than having to have the build.clj (which I find not very user-friendly to just publish a library).

borkdude 2025-10-19T14:37:41.340809Z

does clojars generate api docs?

borkdude 2025-10-19T14:38:04.588639Z

you probably mean cljdoc

jyn 2025-10-19T14:38:15.225619Z

i do mean cljdoc

jyn 2025-10-19T14:38:33.353919Z

i think cljdoc pulls from clojars though?

borkdude 2025-10-19T14:38:38.795389Z

why doesn't cljdoc support git-tagged deps. cc @lee

borkdude 2025-10-19T14:38:49.829079Z

(probably it does)

jyn 2025-10-19T14:41:49.444949Z

(it amuses me quite a lot how similar clojure's infra is to rust's in this respect, even though to my knowledge there wasn't a lot of overlap between the communities at the time any of it was built)

lread 2025-10-19T15:08:42.907789Z

We did some thinking on cljdoc supporting git tagged deps, but did not move forward with it yet

👍 1
jyn 2025-10-19T15:10:10.790399Z

does cljdoc have the same problem that it needs to run build.clj in order to generate API docs? or is it enough to parse the defns without executing require statements?

borkdude 2025-10-19T15:13:10.857279Z

@jyn514 cljdoc uses runtime analysis so it has to execute the code. my lite documentation solution quickdoc solely uses static analysis. https://github.com/borkdude/quickdoc

👍 1
borkdude 2025-10-19T15:13:21.032699Z

(based on clj-kondo)

2025-10-19T15:16:00.303679Z

The issues discovered by @borkdude should be fixed now in v1.0.1.

2025-10-19T15:16:53.201139Z

I'll go ahead and integrate quickdoc as well later today, to get automatic API docs going @jyn514.

❤️ 1
borkdude 2025-10-19T15:21:21.689529Z

@asko304 Thank you. Now I can parse my (unupdated for a long time) homepage!

(require '[babashka.deps :as deps])

(deps/add-deps '{:deps {askonomm/dompa {:git/url ""
                                        :git/tag "v1.0.1"
                                        :git/sha "35de9bc8aaaa165ec3f2efb04691bdca3dd5e446"}}})

(require '[dompa.html :as html]
         '[dompa.nodes :as nodes])
(spit "/tmp/html1.html" (slurp ""))
(spit "/tmp/html2.html" (nodes/->html (html/->nodes (slurp ""))))

(babashka.process/shell {:continue true} "diff" "/tmp/html1.html" "/tmp/html2.html")
I do see differences in the parsed HTML and the generated HTML but maybe it's just whitespace. At the end it appears that some divs are missing:
}(document, "script", "twitter-wjs"));</script></p></div></div></div></body></html>
vs
}(document, "script", "twitter-wjs"));</script></p></div></body></html>

borkdude 2025-10-19T15:21:56.089989Z

could also be bad HTML in my homepage ;)

2025-10-19T15:22:58.210729Z

Dompa lacks support for HTML healing that browsers have (such as that if you forgot to close a div, it would do it for you). But I wouldn't immediately presume it's that, it could easily be a bug in my code. I can't debug this at the moment, but will give it a go later today to see if I can track down where the difference comes from.

2025-10-19T20:21:43.728159Z

I've tracked down the issue with the missing tags, and it is definitely an issue on my side, your HTML is just fine @borkdude. I've pushed out a fix in v1.0.2 for that.

2025-10-19T20:26:25.030399Z

I've also added quickdoc generated docs for the API now: https://github.com/askonomm/dompa/blob/main/API.md re @jyn514

borkdude 2025-10-19T20:48:01.537859Z

@asko304 Awesome. I tried again and noticed that round-tripping loses the doctype:

(require '[babashka.deps :as deps])

(deps/add-deps '{:deps {askonomm/dompa {:git/url ""
                                        :git/tag "v1.0.2"
                                        :git/sha "497a7dc"}}})

(require '[dompa.html :as html]
         '[dompa.nodes :as nodes])
(spit "/tmp/html1.html" (slurp ""))
(spit "/tmp/html2.html" (nodes/->html (html/->nodes (slurp ""))))

(babashka.process/shell {:continue true} "diff" "/tmp/html1.html" "/tmp/html2.html")

2025-10-19T21:55:53.261669Z

As always, fix one bug, another appears. I've now fixed this issue as well in v1.0.3, and I've added https://github.com/askonomm/dompa/blob/main/test/dompa/round_trip_test.clj (I hope you don't mind). I figure it makes sense to start testing against whole sites as opposed to only bits and pieces.

2025-10-19T22:37:39.887089Z

From the README, I gather that the tree-of-nodes is the same idea & shape as that of clojure.xml and clojure.data.xml, but with different keys, so not directly interoperable with libraries that work with those?

2025-10-19T23:01:58.488149Z

I've actually never used clojure.xml or clojure.data.xml, so I've no idea. I just picked naming that I thought would make sense to me. Would the interopability with that be important? Dompa isn't meant to work with XML (though it may be possible with some tweaks)