Fork me on GitHub
#off-topic
<
2020-02-08
>
andy.fingerhut20:02:07

I see strings like this in some data about projects/artifacts available from http://Clojars.org: "scm:git:<git://github.com/Zimpler/duct.logger.honeybadger.git>" This looks like a URI to me, but at least in the official IANA list of "schemes" that appear before the first colon character I found here: https://www.iana.org/assignments/uri-schemes/uri-schemes.xhtml. it lists "git" as a scheme, but not "scm". Is "scm" some new/unofficial URI scheme, or something else?

p-himik20:02:33

Wiki says that the URI format is scheme:[//authority]path[?query][#fragment], where scheme is "a sequence of characters beginning with a letter and followed by any combination of letters, digits, plus (+), period (.), or hyphen (-)". If so, the thing with scm:git:git://... is not a URI.

p-himik20:02:12

Unless git:git://... is the path component, hmm.

andy.fingerhut20:02:29

Yeah, there is one example on the URI Wikipedia page showing a path with colons in it. That wasn't clear to me from the text on the Wikipedia page that was supposed to be allowed, but I don't necessarily take the Wikipedia page as the authoritative source for this, either.

p-himik20:02:05

Just out of curiosity - can you give a link to a Clojars page that uses such (maybe)URIs?

andy.fingerhut20:02:44

It is this auto-generated data file: http://clojars.org/repo/feed.clj.gz described on the next page I link, among others available: https://github.com/clojars/clojars-web/wiki/Data

andy.fingerhut20:02:03

I am pretty sure I have seen the scm:... thing in some internal git config files before, too.

andy.fingerhut21:02:07

Ah, I grep'd and found occurrences in several pom.xml files on my system.

andy.fingerhut21:02:31

So perhaps that syntax is some Maven-specific thing, rather than a standard of any kind.

andy.fingerhut21:02:17

Found some Maven docs describing them here: http://maven.apache.org/pom.html#SCM

p-himik21:02:01

Wiki is just easier to parse sometimes. :) The RFC that defines the URI format also has some examples, including urn:example:animal:ferret:nose, where the part after the first : is indeed the path component. Ah, nice. Seems like it's pretty normal to use unregistered schemas: https://www.w3.org/TR/uri-clarification/#unregistered-uri-schemes

andy.fingerhut21:02:56

Not sure whether that is considered a URI with an unregistered schema, or just some Maven-internal detail that prepends strings to a URI

p-himik21:02:49

FWIW, they themselves call it a URL: https://maven.apache.org/scm/scm-url-format.html And it doesn't contradict the format, so why not.

andy.fingerhut21:02:37

Why doesn't it contradict the format? i.e. which part is the scheme, which part the path, etc.?

andy.fingerhut21:02:01

For example, if the syntax diagram on the Wikipedia page for URIs is correct/complete (I have no idea if it is), then it has '//' in exactly one place, which would imply that in the string "scm:git:<git://github.com/Zimpler/duct.logger.honeybadger.git>" the scheme is "scm:git:git", which would make parsing very ambiguous for schemes if they can contain colon characters.