This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-03-31
Channels
- # announcements (20)
- # asami (14)
- # aws (6)
- # babashka (15)
- # beginners (83)
- # biff (6)
- # calva (93)
- # cider (3)
- # clj-kondo (21)
- # cljdoc (106)
- # cljs-dev (32)
- # clojure (165)
- # clojure-dev (78)
- # clojure-europe (54)
- # clojure-italy (9)
- # clojure-nl (9)
- # clojure-norway (24)
- # clojure-uk (4)
- # clojurescript (6)
- # community-development (2)
- # conjure (2)
- # core-typed (14)
- # datahike (4)
- # datomic (2)
- # emacs (40)
- # events (1)
- # fulcro (11)
- # graalvm-mobile (29)
- # graphql (8)
- # honeysql (19)
- # java (1)
- # jobs (1)
- # lsp (232)
- # malli (5)
- # membrane (112)
- # nextjournal (11)
- # off-topic (63)
- # portal (12)
- # re-frame (6)
- # reagent (3)
- # reitit (4)
- # rewrite-clj (2)
- # shadow-cljs (25)
- # tools-deps (6)
@lee suppose I want to make a test project in the cljdoc repo, to have it analyzed during tests so I can make a test search docset... I guess I'm not sure where to put that or how to set that up using deps.edn
do you have a minute to talk about it?
I have done something similar for visual inspection. There is the stand-alone https://github.com/cljdoc/cljdoc-exerciser…
yes, essentially
but I'd like this automated and within cljdoc itself, so I can exercise generating a cache bundle and transforming that into a searchset
if that seems like a good idea
Do you want to go through full integration? Like have your test project analyzed by cljdoc-analyzer?
right now I have stuck an edn cache bundle in a file
if possible
it feels like something that we should have in general?
but I need to do things like have a markdown file and an asciidoc file and multiple namespaces and such so I can confirm that my code is breaking those down into chunks correctly (for searching)
And you’d prefer not to use a stand-alone github repo because you want it more tightly coupled to cljdoc source base, yeah?
I mean, stop me if I'm doing something ridiculous
or something ill-advised
I'm less focused on the analysis portion and exercising that fully as much as generating the cache bundle input to this searchset creation
(for client-side search)
I don't want to just stick static edn in the project for this since that could break without us knowing
The ingest cycle is not very speedy, I’m guessing you are ok with that for this particular test.
does it take a lot to analyze a tiny already-local project?
Well… you can get an idea by trying an ingest from the command line. But assuming you are ok with speed, I don’t see a problem with creating some sort of cljdoc test project. Ingest only works from jars, so you’d have to jar it up.
ahhh there's the rub
maybe I'll stick with static for the moment
and let people report it if it breaks
btw, I am working on search on the server side. So might have some searchy questions for you sometime soon!
oh! well, this is only search within an individual :group-id/:artifact-id/:version
, and only client-side
something I've had on the backburner forever
but the server-side is mostly done
cool 🙂
I'm definitely up for it
👋 I've recently been thinking about problems and ideas that it seems like cljdoc either already solves or is in the process of solving. I've been trying to get up to speed on all the cool things cljdoc is up to. Just wanted to say hi and say that cljdoc is really neat!
hope it's not a dumb question. what is client side search?
I’ll let @corasaurus-hex answer that one. Here’s the PR https://github.com/cljdoc/cljdoc/pull/466 which has a demo link.
@smith.adriane I was thinking your https://github.com/phronmophobic/dewey might come in handy for cljdoc someday/somehow.
wheeee
I actually am nixing that branch from the PR, @smith.adriane, but it has the beginnings of it
it was just too out of date and rebasing was too painful
I was just curious what was considered client-side since I'm still getting acquainted with cljdoc
the idea is to feed the docs for a given :group-id/:artifact-id/:version
into the browser and populate an in-browser full-text search engine with it. then you can search within just that docset
you can try searching here to see what I mean https://corasaurus-hex.github.io/cljdoc-search/
that's all client-side
I've been working on "client-side" library search with "client-side" meaning the developer's computer. https://github.com/phronmophobic/add-deps
Yea, I was checking out that link from the PR. Looks cool!
oh nice
looks fun 🙂
Both the search and the static/dynamic analysis that cljdocs does is really interesting
it definitely is
I'm trying to avoid having to index every docset server-side, especially when some may only be searched a handful of times before there's a new version (and therefore new docset) and then never searched again
and so having the client index it, cheaply and quickly, seems like a good trade-off maintenance-wise
just a heads-up, I'm not sure how up to date the specs are in the project
the cache-bundle doesn't seem to be up to date
or perhaps there are two things called cache-bundle in the project
defining specs for some of these huge data structures is paaaaaainful
Yeah, https://github.com/cljdoc/cljdoc/issues/532. I started describing data structures in docstrings, whenever I was scratching my head.
So @corasaurus-hex, I’m learning how server-side search currently works. There’s details around how text is tokenized, but basically it seems like we have prefix searching. So a search for thi
will match this
and thing
but not rethink
. I’m gonna guess that client-side search is more along the lines of simple character matching?
no, it's as full as we want
specifically the indexing options here https://github.com/nextapps-de/flexsearch#index-options
you can choose how you want it to tokenize https://github.com/nextapps-de/flexsearch#tokenizer-prefix-search
I'll bet the server-side things can do this as well?
Thanks! The search technology we are using server side (lucene) is geared for speed. It can be tweaked programmatically in tons of ways. It has evolved over decades… and still actively maintained… and widely used. Powerful but not trivial to use. At first glance flexsearch seems a whole lot more end-user-focuced.
But anyway, just wondering out loud how consistent our client vs server side matching techniques are or should be.
the server-side is mostly for matching project names?
Github repos include a list of tags/topics. It might also be possible to check poms or deps.edn files for tags. That might be an interesting addition to search at some point.
Interesting. Right now we only document libs from clojars, but we’ve been thinking about how to include source-based libs (hosted only on github for example) https://github.com/cljdoc/cljdoc/issues/459. One open question I have on that is how we might rank search results. For clojars we are about to do so by download count, but for a git repo, not sure. Maybe github stars?
Stars and follows seem reasonable. Since many clojure libraries are available on clojars, I was thinking it would be interesting to cross reference downloads with stars and see if you there's a useful correlation between the two that would allow us to convert between the two for ranking.
it's very noisy
Ah, so one thing that throws things off is that if a "popular" library depends on an "unpopular" library, you get things like https://clojars.org/crypto-equality with only 20 stars but 14,949,372 Downloads :rolling_on_the_floor_laughing:
I wonder what the data looks like if you somehow credit "unpopular" libraries with the downloads of their dependents
so crypto-equality's star count would have an effective star count of all its dependents
Huh, hadn’t noticed the https://clojars.org/rewrite-clj/dependents page on clojars until just… now.
Yeah… part of google’s ranking system is how many other pages refer to page, right? So how many libs use a lib would be an interesting indicator?
is it worth investigating alternatives? those needs seem super light
And there are probably enough lucene geeks out there who understand the tech well… I’m just not one of them… yet!
Clojars also uses lucene but https://github.com/clojars/clojars-web/wiki/Search-Query-Syntax. I was considering offering that syntax, and while it makes sense for clojars, I decided it probably adds more end-user complexity than value for cljdoc.
Related tangent: We search on pom description but don’t display it in results. I personally find that kind of confusing. Any opinion?
so a lot of search tools offer returning the relevant section of text with the matches highlighted
which is a nice feature
in general I like it to feel obvious why something matched, that way I can assess relevance without having to click through
in this case, though, are people searching mostly for projects they already know about?
or is this more about project discovery?
it feels like the former to me but the latter definitely has some value (however someone needs to have generated docs for the description to be searched?)
sorry, I realize that i'm not offering a concrete opinion
if I had to do it myself I'd just make it match artifact and group id. I'd want to hear a use case and get some inspiration before taking it further
Yeah, I am muddling through myself, thanks for muddling with me! I feel the same way about what has matched being obvious. And I also like highlighting (which we don’t yet do but can do).
I’m guessing that we search the description for discovery. Which seems like an ok thing to do. The description comes from clojars so no need for it to have been already built by cljdoc yet. But if we are searching on it, I feel we should show it in results.
Oh neat, https://github.com/cljdoc/cljdoc/blob/master/doc/adr/0019-use-custom-search.md.
Github repos include a list of tags/topics. It might also be possible to check poms or deps.edn files for tags. That might be an interesting addition to search at some point.