Fork me on GitHub
#immutant
<
2016-02-27
>
weavejester01:02:23

By the way, neither the :uri nor the :path-info should be url-decoded.

weavejester01:02:41

:uri is always the raw URI. And (str context path-info) == uri

weavejester01:02:51

Just to clear things up. simple_smile

tcrawley13:02:17

@weavejester: in the servlet spec, path-info is always returned decoded from HttpServletRequest.getPathInfo()

tcrawley13:02:09

so (str context (encode path-info)) == uri

weavejester14:02:14

@tcrawley: Right, but one of the nice things about Ring is that we don’t need to make the same mistakes as servlets. Because URL encode/decode is lossy, particularly since Java changes “+” into “ “, it makes more sense to keep path-info raw.

tcrawley14:02:54

ah, so you're saying immutant should instead calculate the raw path-info when constructing the request map instead of using getPathInfo? That might work

weavejester14:02:57

Yep. I actually had assumed that the servlet spec didn’t decode PathInfo, which is why there isn’t a huge amount of documentation for it.

weavejester14:02:59

getContextPath isn’t decoded, strangely.

weavejester14:02:31

So you can just use (ring.util.request/set-context request (.getContextPath servlet-request))

tcrawley14:02:28

Great, thanks for your input. I'll take a look

jcrossley315:02:18

@weavejester: i take your point, but since context and path-info are strictly servlet things (not ring), why not follow their spec? someone familiar with getPathInfo isn't going to expect to see that "+" in :path-info

weavejester15:02:13

jcrossley3: They’re not strictly servlet things. Other libraries, like Compojure, make use of them.

weavejester15:02:41

I also don’t see the point of adopting mistakes made in earlier libraries just for compatibility.

jcrossley315:02:45

well, i meant that they're not http things, and therefore not ring things

weavejester15:02:11

They’re not in the Ring spec, but they are in the Ring library, like :params or :session.

jcrossley315:02:18

i guess it's only context that is servlet-y. there is a CGI variable PATH_INFO

weavejester15:02:56

Sure, but again, why adopt a faulty convention?

jcrossley315:02:58

still, it does make me wonder why they would explicitly spec it to be decoded

jcrossley315:02:11

that's a pretty explicit mistake simple_smile

weavejester15:02:38

I don’t mean that it was accidental; just that it was a mistake in the design.

weavejester15:02:18

It looks like the full URI of servlets is: contextPath + servletPath + pathInfo

weavejester15:02:29

But servletPath and pathInfo are decoded

weavejester15:02:40

and contextPath and the URI isn't

weavejester15:02:10

And because URL decodes are lossy, it’s not a direct mapping.

jcrossley315:02:15

do you know if the http rfc says anything about pathinfo?

weavejester15:02:34

It doesn’t.

weavejester15:02:48

It’s not something that’s part of the HTTP request.

weavejester15:02:04

It’s just something made up originally by CGI scripts.

weavejester15:02:48

:path-info might have been a poorly chosen name, considering that it’s slightly different to the PATH_INFO in CGI, and getPathInfo in servlets.

weavejester15:02:07

However, at the time I didn’t realise that

tcrawley15:02:50

I think that's where my confusion came from - since it's not in the ring spec, and named the same as the servlet value, I assumed it was the servlet value

weavejester15:02:10

That’s a reasonable assumption.

weavejester15:02:24

I’ll make the documentation clearer in the next version of Ring.

weavejester15:02:04

I had assumed that :path-info in Ring was the same as the servlet version, but it turns out that’s not the case.

weavejester15:02:57

I might use :path in future and deprecate :path-info… but that likely won’t happen for a while.

tcrawley15:02:31

I was surprised that is was decoded by servlets, I only discovered that yesterday

weavejester15:02:48

Yeah. My guess is that it’s to follow the CGI definition

weavejester15:02:54

Which makes sense for shell scripts

weavejester15:02:06

But doesn’t really make sense for modern web applications.

weavejester15:02:45

Reading that Rack thread talks about lossy decoding as well

weavejester15:02:56

But because Rack follow the CGI spec they have no choice

weavejester15:02:51

Fortunately Ring doesn’t need to, so we can break with tradition a little. Ideally it would be named something different, but for now it can just be documented better.

tcrawley15:02:47

I know you've stated that it shouldn't be part of the ring spec, but if it was part of the spec, it would be a stronger stance than documentation of the implementation's behavior would be

weavejester15:02:19

Well, the Ring spec just covers HTTP. However, it might be worth putting up something else for the keys introduced by the library itself, like :context, :path-info, :params, :session and so forth.

jcrossley315:02:26

@weavejester: not to dissuade you from fixing the web, but do you see no benefit in keeping ring at least a little in-line with wsgi, rack, servlets, etc?

jcrossley315:02:19

e.g. after introducing :path, you might then decode :path-info rather than deprecate it

weavejester15:02:43

jcrossley3: I do see a benefit in being consistent. However (a) it’s too late now, and (b) the problems a decoded :path-info provides outweigh the advantages of compatibility, IMO.

jcrossley315:02:19

problems with existing apps/middleware you mean?

weavejester15:02:14

Yes. I’d prefer not to change the behaviour of an existing key.

weavejester15:02:39

I’m careful about backward compatibility in Ring.

jcrossley315:02:42

i can respect that, but like you say, it'd be good to be clear about where you diverge with CGI, et. al.

tcrawley15:02:53

jcrossley3: I don't see users getting confused about :path-info being raw in ring, as long as it is clearly documented

weavejester15:02:05

Yes, that I’ll agree on simple_smile

weavejester15:02:18

The only reason there isn’t documentation is because I didn’t realise that it was different

tcrawley15:02:37

and clearly no one has complained about it until now, and the complaint was "this shouldn't be decoded"

jcrossley315:02:54

@tcrawley: it confused both of us, duh simple_smile

weavejester15:02:06

Thanks for bringing this to my attention. I’ll make the documentation clearer.

weavejester15:02:15

Gotta go for now. Bye!

tcrawley15:02:34

jcrossley3: sure, but we're not users :)

ddellacosta20:02:20

I just caught this conversation but I’m a bit confused what the outcome was

ddellacosta20:02:28

what is actually going to be documented in the ring documentation?

ddellacosta20:02:01

It sounds like nothing is really going to change, and this “bug” such as it is will remain.

ddellacosta20:02:46

It’s certainly not a huge deal to pass in a builder to Undertow to make sure that the url and path info is not decoded, so as to let Ring and Compojure do their thing. But seems like we may need better documentation for both Immutant (in terms of what Undertow is doing) and Ring/Compojure (in terms of where stuff is actually decoded) to help users not flail about when they run into what I did.

ddellacosta20:02:03

Apologies if I’m misunderstanding what the above discussion concluded; I just don’t see where it addresses my particular issue.

weavejester21:02:19

@ddellacosta: I’ll add some documentation to the path-info and set-context functions, and probably add a page to the wiki as well documenting the keys that the Ring library adds above and beyond the Ring specification. It’s also a possibility that I can add this into the servlet library as well.

ddellacosta21:02:55

I see—so one will still have to toggle decoding off in Undertow. (I realize this is not your particular concern @weavejester, just clarifying)

weavejester21:02:15

That’s right.

ddellacosta21:02:17

Please note also that where I first encountered this was in Compojure’s handling of params extracted from :path-info

weavejester21:02:51

Compojure doesn’t decode the path-info as far as I’m aware.

weavejester21:02:12

Very deliberately so, since otherwise context wouldn’t work correctly.

weavejester21:02:39

Right, it decodes the parameters once they’re matched.

weavejester21:02:43

But not the path-info

ddellacosta21:02:10

hmm, those are extracted from path-info I thought

weavejester21:02:26

Right, but it grabs the parameters, and then decodes them.

ddellacosta21:02:50

yes—but if path-info is already decoded, it’s a problem

ddellacosta21:02:09

sorry, not sure if I’m simply missing your point here

ddellacosta21:02:14

I’m not sure what the correct behavior is

ddellacosta21:02:29

/should be rather

weavejester21:02:33

The correct behaviour is that the :path-info key should not be decoded.

tcrawley21:02:37

@ddellacosta: I think we will fix this in Immutant by populating the request map with a :path-info that isn't decoded

tcrawley21:02:02

extracted from the raw uri

ddellacosta21:02:13

okay, that helps clarify things @tcrawley

ddellacosta21:02:33

and in the meantime I will continue to pass in the builder w/DECODE_URL toggled off to handle this, as a workaround

tcrawley21:02:09

cool, I'm glad you have a workaround for now. I'd love to get an Immutant release out in the next couple of weeks

ddellacosta21:02:21

(and thanks @weavejester too for jumping in and explaining this)

tcrawley21:02:33

I'll hijack your existing jira for this fix

weavejester21:02:47

No problem. Sorry for the confusion.

ddellacosta21:02:01

sure thing—and if I can contribute anything else (happy to write the patch if you’d like) just let me know.

ddellacosta21:02:13

np @weavejester, it all makes sense now re-reading what you wrote initially

tcrawley21:02:16

if you want to write a patch, that would be swell - you would have to fix it in two places

ddellacosta21:02:30

definitely, would love to contribute

tcrawley21:02:06

the former is used when inside WildFly or EAP, the latter when using undertow directly

ddellacosta21:02:15

I see, simply ensure that those are not url-encoded?

ddellacosta21:02:02

(I assume I’ll have to possibly extract that info from somewhere else if .getPathInfo/.getRelativePath returns the decoded value by default)

tcrawley21:02:12

you won't be able to prevent the decoding of the values returned by getPathInfo and getRelativePath - you'll have to ignore those methods and extract the path-info from the uri by stripping off the context, yeah

ddellacosta21:02:19

right, gotcha

ddellacosta21:02:46

okay, think we’re on the same page—will try to get something ASAP

ddellacosta21:02:01

will let you know if I’m going to be slower than by Monday @tcrawley, if that works for you

tcrawley21:02:19

@ddellacosta: well, this is embarrassing - I did know about path-info being escaped, and how that was the wrong behavior, because I fixed it in Immutant 1: https://github.com/immutant/immutant/tree/1.x/modules/web/src/main/clojure/immutant/web/servlet.clj#L31

ddellacosta21:02:44

haha, whoops!

tcrawley21:02:56

In fact, I remember it being that way in Immutant 2, but without the comment to explain why

ddellacosta21:02:04

you won’t be the first package maintainer to have forgotten about a fix...

tcrawley21:02:27

so at some point we said "why aren't we just calling getPathInfo?" and "fixed" it

ddellacosta21:02:56

that looks like the right fix too; was trying to find some utils already available to pull that info out of

ddellacosta21:02:54

(sorry, I suppose you would know that better than I)

tcrawley21:02:08

no problem!

tcrawley21:02:16

yeah, I think that's the right way to calculate it

ddellacosta21:02:46

in any case, thanks for the heads up on that, I can simply try to port that over

tcrawley21:02:52

great, thanks! and no rush

tcrawley21:02:06

and feel free to ping me if you have any questions

tcrawley21:02:59

I did plumb through the commits for Immutant 2, but can't find anywhere where we had it setting a raw path-info, so I feel less dumb

ddellacosta21:02:37

okay, good to know. And you should definitely not feel dumb about this, easy thing to understand

tcrawley21:02:49

heh, thanks :)