Fork me on GitHub
#code-reviews
<
2018-12-17
>
jaihindhreddy10:12:31

Wrote this fn to parse a line from our log files. Idiomatic?

(require '[java-time :as t])
(defn parse-log-entry [line]
  (let [log-format (array-map :timestamp "\\[([^\\]]+)\\]"
                              :file      "\\[Fl:\\s*([^\\]]+)\\]"
                              :class     "\\[Cl:\\s*([^\\]]+)\\]"
                              :method    "\\[Fn:\\s*([^\\]]+)\\]"
                              :line      "\\[Ln:\\s*([^\\]]+)\\]"
                              :level     "\\[([^\\]]+)\\]"
                              :summary   "\\[([^\\]]+)\\]"
                              :thread-id "([^_]+_[^\\s]+)\\s"
                              :server-ip "([\\d.]+)"
                              :client-ip "([\\d.]+)"
                              :msg       "\\[(.+)$")
        log-entry-regex (->> (concat "^" (vals log-format) "$")
                             (clojure.string/join "\\s*")
                             re-pattern)
        entry (->> (re-find log-entry-regex line)
                   rest
                   (map clojure.string/trim)
                   (map vector (keys log-format))
                   (into {}))
        parse-timestamp #(->> (concat % (repeat \0))
                              (take 24)
                              (apply str)
                              (t/local-date-time (t/formatter "dd-MM-yyyy HH:mm:ss.SSSS")))]
    (-> entry
        (update :line #(Integer/parseInt %))
        (update :timestamp parse-timestamp))))

jumar10:12:02

It seems that you might want a real parser - take a look instaparse

jaihindhreddy12:12:11

Yeah, instaparse seems like overkill, and bashing strings together to create regexps seems like underkill. Do you reckon instaparse is a better fit here?

dominicm12:12:16

Look into named group regexes in java.

dominicm12:12:52

They might make the code a little more obvious.

jumar12:12:01

Performance might have to be taken into consideration too (if your logs are large) - apart from readability and sensitivity to subtle errors, regexes aren't terribly performant. However, instaparse is also known to be quite slow (antlr is much better, but as you say that's probably an overkill in your case)

jaihindhreddy12:12:02

Thanks Dominic!

dominicm12:12:56

Regex is fast by a lot of measures. Especially if you don't use lookahead,

jaihindhreddy12:12:24

Yeah @U06BE1L6T this is just for a script that helps me analyze logs easily on my machine in the REPL. Will use a proper parser generator if we decide to put this somewhere at scale.

cvic13:12:54

hmm, what type of logs are you trying to parse?

jaihindhreddy15:12:41

Logs of an application (Apache + PHP) at work.

val_waeselynck17:12:00

array-map is fragile - when you want order, you usually don't want maps. I'd use a vector of tuples instead. I would also do some initialization outside of the function body - here you're compiling a big regex for each line of log!

👍 1
val_waeselynck17:12:25

Likewise, it seems that zipmap is what you want for computing entry.

jaihindhreddy17:12:06

TIL zipmap. I initially defined log-entry-regex as a var, but moved it in here later. Thank you for your insight.