Fork me on GitHub
#portkey
<
2018-03-26
>
baptiste-from-paris14:03:37

and if someone have 10min to help me out with my regex hell I would be a happy man

baptiste-from-paris14:03:31

give me 1min to create a snippet

baptiste-from-paris14:03:12

So I am working on tests suite from aws to sign v4 as I find out that some cases were not handled. Anyway, They give a raw text file representing a request and I need to do a req-text->req-map and to capture elements from the file

baptiste-from-paris14:03:20

some of which are optionnal

baptiste-from-paris14:03:55

this is my (wrong) regex =>

baptiste-from-paris14:03:09

(defn req-text->req-map
  "Given  a request  from AWS  test*.req, returns  a clj-http  request
  map."
  [input]
  (let [[_ verb uri host date]
        (re-find #"([A-Z]+)\s(\S+).+\nHost:(\S+)\nX-Amz-Date:(\S+)" input)]
    {:request-method verb
     :uri uri
     :host host
     :date date}))

baptiste-from-paris14:03:16

here are the results =>

baptiste-from-paris14:03:11

you’ll find nil because I don’t handle My-Header and params yet

baptiste-from-paris14:03:19

I tried this one without success

baptiste-from-paris14:03:47

(def input "GET / HTTP/1.1\nHost:\nMy-Header1:value2\nMy-Header1:value2\nMy-Header1:value1\nX-Amz-Date:20150830T123600Z")
  
  (let [[_ & a]
        (re-find #"([A-Z]+)\s(\S+).+\n(My-Header\d:value\d\n)/" input)]
    a)

baptiste-from-paris14:03:51

and I can’t figure out how capturing the optional multiple My-Header

baptiste-from-paris14:03:02

hint : I really suxx at regex

cgrand14:03:28

That’s the last missing case or are there more tests with more headers? I’m not sure regexes are the answer

cgrand14:03:26

Is GET / HTTP/1.1\nHost:\nMy-Header1:value1\n value2\n value3\nX-Amz-Date:20150830T123600Z even valid?

cgrand14:03:30

>>> Header fields can be extended over multiple lines by preceding each extra line with at least one SP or HT.

baptiste-from-paris14:03:05

@cgrand I don’t catch Param1=value1 also

baptiste-from-paris14:03:14

how else than regex

cgrand14:03:55

manual parsing or several regexes stages

cgrand14:03:52

#“([A-Z]+)\s(\S+).+\nHost:(\S+)\n((?:My-Header\d:.*\n(?:[ \t].*\n)*)*)X-Amz-Date:(\S+)”

cgrand14:03:34

But seriously, don’t do that

cgrand15:03:21

A HTTP parsing lib?

baptiste-from-paris15:03:45

Ok, I’ll look at some HTTP Parsing lib

baptiste-from-paris15:03:09

But just for information, If you really had to do regex, it’s possible right ?

cgrand15:03:42

I would read the file as lines, parse the 1st line as method path protocol

cgrand15:03:50

so you consume the 1st line and then re-seq on headers

cgrand15:03:58

(let [req “GET / HTTP/1.1\nHost:\nMy-Header1:value1\n  value2\n     value3\nX-Amz-Date:20150830T123600Z”
      [_ method path headers] (re-matches #“(?s)([A-Z]+)\s+(\S+).*?\n(.*)” req)
      headers (for [[_ header value] (re-seq #“(?s)(\S+):(.*?\n(?:[\t ].*?\n)*)” (str headers “\n”))]
                [header value])]
  [method path headers])

cgrand15:03:18

yields

[“GET”
 “/”
 ([“Host” “\n”]
  [“My-Header1” “value1\n  value2\n     value3\n”]
  [“X-Amz-Date” “20150830T123600Z\n”])]

baptiste-from-paris19:03:13

I don’t find libs that could do the job, I tried with org.apache.httpclient but I can’t get the request body when a POST request

cgrand19:03:14

And my snippet above?

baptiste-from-paris19:03:24

let me try, I was focusing on parsing raw HTTP with apache httpclient ^^

baptiste-from-paris19:03:25

headers are not supposed to be unique ?

cgrand20:03:42

No. Some may be multi valued and it’s a way to encode that.

baptiste-from-paris20:03:03

a first draft that works well for headers but not post param=value

baptiste-from-paris20:03:16

(defn req-text->req-map-revisited [req-text]
  (let [is (ByteArrayInputStream. (.getBytes req-text (StandardCharsets/UTF_8)))
        session-input-buffer (doto (SessionInputBufferImpl. (HttpTransportMetricsImpl.) (* 8 2048))
                               (.bind is))
        basic-http-request (.parse (DefaultHttpRequestParser. session-input-buffer))
        headers (for [h (.getAllHeaders basic-http-request)]
                  [(.getName h) (.getValue h)])
        headers (into {}
                      (x/by-key (comp (interpose ",")
                                      x/str))
                      headers)
        request-line (.getRequestLine basic-http-request)]
    (cond->
     {:uri (.getUri request-line)
      :request-method (.getMethod request-line)}
      (not (or (nil? headers) (empty? headers))) (assoc :headers headers))))

baptiste-from-paris20:03:24

{:uri "/", :request-method "GET", :headers {"Host" "", "My-Header1" "value2,value2,value1", "X-Amz-Date" "20150830T123600Z"}}

baptiste-from-paris21:03:44

For info => find in tests

A note about signing requests to Amazon S3:

In exception to this, you do not normalize URI paths for requests to Amazon S3. For example, if you have a bucket with an object named my-object//example//photo.user, use that path. Normalizing the path to my-object/example/photo.user will cause the request to fail. For more information, see Task 1: Create a Canonical Request in the Amazon Simple Storage Service API Reference: