Fork me on GitHub
#babashka
<
2020-10-20
>
yubrshen01:10:53

I'd like to use babashka to process a file line by line reading from standard input, but I could not produce the processed lines on the standard output. I've simplified my code to the following to figure out the problem: (ns convert.drop-bart-and-uppercase) (defn clean-location [x] x) (defn clean [lines] (->> lines (map clean-location)) ) (clean *in*) and use the following to execute: cat samples.dat | bb -i -o -f ../convert/src/convert/drop_bart_and_uppercase.clj and here is the sample.dat: Time=Thu Oct 1 15:27:15 PDT 2020, Value=75.7, Location=L16-tempmon.xxx.yyy, Device_Type=tempSensor, Value_Type=Temperature Time=Thu Oct 1 15:28:12 PDT 2020, Value=91.4, Location=a40-tc-ups, Device_Type=UPS, Value_Type=Temperature to my understanding, I'd expect it outputs the identical two lines of the content. But I see nothing. Please help me to figure out my problem. Thanks! ------------ I found the following command works as expected: < samples.dat bb -io '(load-file "/home/yshen/data/temperature-data-archive/convert/src/convert/drop_bart_and_uppercase.clj") (convert.drop-bart-and-uppercase/clean *input*)' but I find it too clumsy to load the file and then call the function

borkdude06:10:52

@yubrshen In the first piece of code you use *in* (not *input*) which is not a seq of lines, but just the stdin stream from Clojure.

yubrshen13:10:57

@borkdude thanks! I wish that I could use *input, how?*

yubrshen14:10:34

I would like to learn what is the idiomatic way to process every line of string with Babashka?

borkdude14:10:29

@yubrshen You can use *input* but this is honestly more for one-liners on the command line. For scripts you might want to use:

$ ls | bb -e "(first (line-seq (io/reader *in*)))"
"CHANGELOG.md"

borkdude14:10:30

io/reader is coming from

yubrshen14:10:57

Thanks. use/input is perfect for my need.

yubrshen17:10:38

Finally, this is what works for my need.

yubrshen17:10:31

< samples.dat bb -i -o '(->> *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))' I can use user/**input** inside of my script file to access the stdin as list of lines, but I have not figured out how to output lines to stdout inside my script. The above one-liner works, but it's getting hard to maintain. Is there such equivalent mechanism to let babashka to help to output lines to stdout from a script?

yubrshen17:10:19

I can improve the readability but not keeping in the ecosystem of Clojure:

yubrshen17:10:42

#!/usr/bin/env bash < $1 bb -i -o '(->> *input* (map (fn [line] (clojure.string/replace-first line #"Location=([^.,]+)[^,]+" #(str "Location=" (clojure.string/upper-case (last %1)))))))'

yubrshen17:10:41

Is there any better approach, keeping my code mostly in Clojure development environment?

borkdude18:10:45

Without babashka in/output flags:

(ns my-script
  (:require [ :as io]
            [clojure.string :as str]))

(defn lines []
  (line-seq (io/reader *in*)))

(->> (lines)
     (map
      (fn [line]
        (str/replace-first line #"Location=([^.,]+)[^,]+"
                           #(str "Location=" (str/upper-case (last %1))))))
     (run! println))

borkdude18:10:54

This also works with Clojure on the JVM

yubrshen19:10:59

Yes, exactly, this is what I'm looking for to learn to have the script to run both with Clojure and Babashka. Thanks a million!

yubrshen04:10:26

@borkdude Thank you again for your help! I have been looking for the idiom to use Babashka to perform sed like streaming editing with Clojure sophistication and simpleness. Finally, I think learned the following:

(ns convert.swap-time
  (:require [ :as io]
            ))

(defn sweep
  "Sweep every line from stdio by the function,
  and output to stdout line by line."
  [f]
  (->> (line-seq (io/reader *in*))
       (map f)
       (run! println)))

(defn move_time_front [line]
  (clojure.string/replace-first
   line
   #"^(.+, )(Time=[^,]+, )(.+)$"
   "$2$1$3"
   ))

(sweep move_time_front)

borkdude09:10:15

How's this for passing options to the nifty $ macro?

user=> (def sw (java.io.StringWriter.))
#'user/sw
user=> (-> ($ ls -la Dockerfile) ^{:out sw} ($ cat) check :exit)
0
user=> (str sw)
"-rw-r--r--@ 1 borkdude  staff  729 Oct 15 17:25 Dockerfile\n"

jeroenvandijk09:10:46

I don’t use metadata that much so I’m not sure how to read it 😬

borkdude09:10:43

The metadata preceding the ($ ...) form are the options for that form

jeroenvandijk09:10:37

So in this case the metadata is attached to the return value of ($ ls -la Dockerfile) and used by ($ cat) ?

borkdude09:10:56

no, the metadata is only attached to ($ cat), this is how metadata works.

borkdude09:10:33

it is the same as writing (process '[cat] {:out sw})

jeroenvandijk09:10:36

I’m not sure if my clojure knowledge is helping me here or actually making it more complex. macroexpand-1 is not helping here (:exit (check ($ ($ ls -la Dockerfile) cat)))

jeroenvandijk09:10:25

ok so I think there are two people who will have no issue using this. beginners and more advanced clojure users

jeroenvandijk09:10:40

But maybe it just a valueable lesson about clojure 🙂 Thank you

jeroenvandijk09:10:16

I have updated my mental model

Dig19:10:04

Love the $ macro, and I like metadata use, just took me a little bit of time to figure out how to adopt it for my use. I've ran into some other weird issue that when I use (check) it gets stuff if there is no error, but goes through if there is error. I will try to reproduce with smaller use case and report later.

borkdude20:10:07

hmm ok, thanks!

borkdude20:10:17

Yeah! Please let me know about the bug. There's still time to fix before it goes into 0.2.3

Dig20:10:46

Sorry, very busy this week, I will try to isolate it at some point. Just need to try it with something simpler then aws command line. If I uncomment check above it get stuck on the success, but not on error.

borkdude20:10:07

what does stuck mean?

Dig20:10:25

no output, like it is waiting for something

Dig20:10:31

and I have ^C it

borkdude20:10:36

ah, this explains it. yes, check will wait for the process to exit, else it can't inspect the exit code.

borkdude20:10:09

so the process is maybe waiting for something?

borkdude20:10:31

check = deref + throw on non-zero

Dig20:10:42

hmm, strange it definitely exist w/o check

Dig20:10:57

is there a way to dump stack, like SIGQUIT or something?

borkdude20:10:11

How big is the JSON it's trying to write to stdout?

borkdude20:10:40

@U01BDT7622X Can you try with e.g.:

{:out (io/file "out.json")}
to see if the process is maybe waiting for stdout to be consumed?

Dig20:10:54

when i added it to $ it writes out 148k file and exits

Dig20:10:34

if i put #_ in front of it, it gets stuck again

Dig20:10:17

some kind of buffering thing, try maybe with big .json file?

borkdude20:10:21

ah so that may be it

borkdude20:10:42

yeah, so:

(-> (process ["cat"] {:out (io/file "/tmp/foo.csv") :in (io/file "/Users/borkdude/Downloads/1mb-test_csv.csv")}) check)
works, but if I remove :out is has nowhere to write, so cat is going to wait until it can

borkdude20:10:45

@U01BDT7622X A solution:

user=> (def csv (with-out-str (-> (process ["cat"] {:out *out* :in (io/file "/Users/borkdude/Downloads/1mb-test_csv.csv")}) check)))
#'user/csv
user=> (count csv)
1000448

👍 3
borkdude20:10:21

whereas

(def csv (with-out-str (-> (process ["cat" "foo"] {:out *out*}) check)))
would give an error

borkdude20:10:00

I'll write a note about this in the docs

borkdude20:10:00

This is probably also a good option:

(def sw (java.io.StringWriter.))
(-> (process ["cat"] {:in (slurp "") :out sw}) check)
(count (str sw)) ;; 1043005

borkdude20:10:27

as long as it has a way to write the stream somewhere

borkdude21:10:01

maybe it would be convenient to have an :out :string for this use case

Dig21:10:32

yep, that is the case, so it works w/ w-o-s

Dig21:10:09

yes, :string is good idea, since it is a common case to check and :out slurp

borkdude21:10:28

@U01BDT7622X Are you testing with bb or directly on the JVM?

Dig21:10:57

bb from builds

Dig21:10:29

that is why i could not dump stack to see where it is stuck

borkdude21:10:34

This should now work in the JVM lib:

(testing "output to string"
    (is (string? (-> (process ["ls"] {:out :string})
                     check
                     :out))))
I'll push it to bb master

Dig21:10:45

c00l, i will test it on my use case once it is build

borkdude21:10:58

ok, just pushed it. should be a few minutes

borkdude22:10:50

Should be there now. With this enhancement the following now also works:

user=> (count (-> (process ["cat"] {:in (slurp "") :out (io/file "/tmp/download.csv")}) check :out slurp))
1043005

borkdude22:10:07

i.e. :out contains the same value as was put in

Dig01:10:52

it works! thank you for all the hard work!

borkdude13:10:22

@yubrshen If you change *in* to *input* in your top program, that should maybe work. If you want to get lines from stdin yourself, you can use (clojure.string/split-lines (slurp *in*)) or (line-seq ( *in*))

borkdude13:10:55

ah, I see. yes. *input* is only defined in the user namespace, so you have to use user/*input* in your top program

borkdude13:10:19

or get rid of the ns declaration

yubrshen14:10:51

@borkdude I see. Just use/input Thanks! I may need to have the ns namespace in order to use Clojure's test framework.

yubrshen04:10:26

@borkdude Thank you again for your help! I have been looking for the idiom to use Babashka to perform sed like streaming editing with Clojure sophistication and simpleness. Finally, I think learned the following:

(ns convert.swap-time
  (:require [ :as io]
            ))

(defn sweep
  "Sweep every line from stdio by the function,
  and output to stdout line by line."
  [f]
  (->> (line-seq (io/reader *in*))
       (map f)
       (run! println)))

(defn move_time_front [line]
  (clojure.string/replace-first
   line
   #"^(.+, )(Time=[^,]+, )(.+)$"
   "$2$1$3"
   ))

(sweep move_time_front)