Fork me on GitHub
#clojars
<
2024-03-15
seancorfield14:03:13

How does Clojars determine canonical or not?

sheluchin15:03:03

Thanks for asking that, @U04V70XH6. To add on, when looking at the https://github.com/clojars/clojars-web/wiki/Data#useful-extracts-from-the-poms, is there a way to differentiate between the canonical and its forks? I believe Clojars is simply checking whether the group name is using org.clojars.username format. If it's a org., Clojars will say it's a fork. In other cases, it interprets it as a canonical. If my understanding is correct, I'm not sure it's possible to reliably determine a fork from the data. The org. convention doesn't seem to be commonly used by users. > The org groups have historically been used to hold things like throwaway alpha versions and forks of other projects, but the net groups don’t have that history. You are free to use either group as you like. https://github.com/clojars/clojars-web/wiki/Groups#personal-groups > org.clojars.<clojars-username>: this group exists for each Clojars user, and is automatically verified. These groups have existed since the early days of Clojars, and have typically been used as sandboxes/for non-canonical forks. We recommend using net.clojars. for "official" releases instead. https://github.com/clojars/clojars-web/wiki/Verified-Group-Names#do-i-have-to-have-my-own-domain-name-to-publish-to-clojars

tcrawley15:03:30

This check is old, it just looks for an org.clojars.<username> group. The idea at the time was folks would use this group for forks.

tcrawley15:03:55

Thanks for the details @UPWHQK562!

tcrawley15:03:22

Yeah, there isn't really a way to tell that a particular name is the "canonical" release.

sheluchin15:03:24

Thanks for confirming, @U06SGCEHJ. I wonder if checking the length of the versions vector would be a reasonably strong heuristic. It checks out for markdown-clj, but not sure if it would cover the majority of cases. Or maybe there's a way to cross-reference the GitHub data for the repo to determine fork status?

tcrawley15:03:28

Hey, one of those non-canonical forks is mine! :) I don't know if counting versions would work very well; I believe I have had forks in the past where I released more versions than the canonical project. Also, the git data on the fork could be the forked repo, or could be the original repo. The latter would make it easier, but the former would require some API calls to github to determine the fork status. Even then, what if I create a lib, but don't publish it to clojars. You take my code and deploy it to clojars. I later do the same thing under a different group. Which one should be considered canonical? Another point - what if one of those markdown-cljs is a completely different project that happens to have the same name? I'm not trying to discourage, it's just a hard problem.

sheluchin16:03:18

I don't mind making additional API calls to get the data to make the determination, and it doesn't even have to be perfect, just covering the majority of cases. I'll do a little test to see how well versions count works. Is there anything you can say about the scm data? Looking at your fork, @U06SGCEHJ, I see:

{:group_id "org.clojars.tcrawley",
 :artifact_id "markdown-clj",
 :description "Markdown parser",
 :scm
 {:connection "scm:git:",
  :developer-connection
  "scm:git:",
  :tag "ab977690922c2fee26133ca189b83f5802663061",
  :url ""},
 :homepage "",
 :url "",
 :versions ["0.9.43a"]}
Which is exact same as the canonical, with only [:scm :tag] and :versions differing.

seancorfield16:03:15

I would say that if the username/orgname in the SCM doesn't match the org.clojars. then it's pretty certain that's non-canonical, i.e., both org.clojars. and a non-matching username have to be present to declare it non-canonical?

sheluchin16:03:51

@U04V70XH6 Not sure I understand what you mean. Here is the canonical for comparison against the fork data I posted above:

{:clojars_feed/group_id "markdown-clj",
 :clojars_feed/artifact_id "markdown-clj",
 :clojars_feed/scm
 {:connection "scm:git:",
  :developer-connection
  "scm:git:",
  :tag "bfee1ec8ee5458acdc897178f3b1217f4c57c8f8",
  :url ""},
 :clojars_feed/homepage "",
 :clojars_feed/url ""}
What do you mean by "username/orgname in the SCM". And using org.clojars. as part of the check seems pretty unreliable because, at least in the markdown-clj case, most of them don't use that convention. Here's a fork https://github.com/boynton/markdown-clj:
{:clojars_feed/group_id "boynton",
 :clojars_feed/artifact_id "markdown-clj",
 :clojars_feed/description "Markdown parser",
 :clojars_feed/scm
 {:connection "scm:git:",
  :developer-connection
  "scm:git:",
  :tag "15548a279ecbed997e6649f430f7d798f0836f1f",
  :url ""},
 :clojars_feed/homepage "",
 :clojars_feed/url "",
 :clojars_feed/versions ["0.9.8"]}

seancorfield16:03:05

I was referring to your comment above "Which is exact same as the canonical" -- so the scm had yogthos as the username/orgname but the published group was org.clojars.tcrawley. But, yes, I understand that most forks aren't going to be that obviously non-canonical.

seancorfield16:03:02

There are also cases where a project gets "abandoned" and someone else's fork becomes the de factor canonical version -- so a version count won't help there (because the maintained fork is going to have far fewer versions published than the original for quite some time).

sheluchin19:03:15

@U04V70XH6 good point. Thank you. I'll keep at it... I think one way or another there's a way to get a decent result.