Fork me on GitHub
#babashka
<
2023-03-20
>
borkdude10:03:28

I'm contemplating adding something to bb tasks around "target" and "modified-since" since it's getting a bit boiler-platy sometimes. E.g.

:task (clojure "-T:build uber")
:target "target/uberjar.jar"
:modified-since ["src" "deps.edn"]
:depends [whatever]
instead of:
:task (when (seq (fs/modified-since "target/uberjar.jar" ["src" "deps.edn"])). (clojure "-T:build uber"))
:depends [whatever]
If target is newer than modified-since, the whatever task would not be executed (unlike with fs/modified). The return value of the task will be :target regardless of the :task expression. 🧵

👀 4
borkdude10:03:02

I was working on this task now:

{:tasks
 {:init (def lib-sci (str (first (fs/glob "clj" "LibScimacs.{dylib,dll,solib}"))))
  :requires ([babashka.fs :as fs])
  libscimacs {:doc "Build libscimacs uberjar"
              :task (when (seq (fs/modified-since "target/classes" ["src" "build.clj" "deps.edn"]))
                      (clojure
                       {:dir "clj"} "-T:build libscimacs"))}
  native-image {:doc "Build native-image"
                :depends [libscimacs]
                :task (when (seq (fs/modified-since lib-sci ["clj"]))
                        (shell
                         {:dir "clj"}
                         "native-image"
                         "--shared"
                         "-cp" "target/classes"
                         "-H:Name=LibScimacs"
                         "--enable-preview"
                         "-H:+ReportExceptionStackTraces"
                         "-J-Dclojure.spec.skip-macros=true"
                         "-J-Dclojure.compiler.direct-linking=true"
                         "--initialize-at-build-time"
                         "--verbose"
                         "--no-fallback"
                         "--no-server"
                         "-J-Xmx3g"))}
  cargo:build {:doc "Build Rust binary"
               :depends [native-image]
               :task (shell "cargo build")}}}
as an example of where I used fs/modified where sources / target can be used instead

mkvlr10:03:13

why not go with something less brittle than modification times?

borkdude10:03:06

for my own bb.edn this is usually sufficient though (and fast)

mkvlr10:03:27

hashing contents / directory trees should be fast as well

borkdude10:03:28

but for more complicated builds (like skipping cljs builds) it's more complicated

borkdude10:03:53

yes, but in the case of skipping cljs builds, hashing also wasn't enough

borkdude10:03:34

I guess you should be able to plug in your own function but with a good / easy default

mkvlr10:03:44

> yes, but in the case of skipping cljs builds, hashing also wasn’t enough curious to learn more, didn’t we use it successfully in dejavu?

borkdude10:03:19

the use case I usually deal with is: should I rebuild the uberjar, should I rebuild the native-image, which I now do with fs/modified-since

borkdude10:03:37

> didn’t we use it successfully in dejavu yes, but there was a lot more involved, e.g. scanning the namespaces for macro usages

borkdude10:03:21

and checking things on a cloud service

borkdude10:03:35

you can already script this with bb yourself, but with :modified-since (or so) I want to capture what I was doing with fs/modified-since in a more declarative approach. hashing also requires you to keep state of previous builds around, a bit more complicated than what I want to do here

mkvlr10:03:36

skipping a build also requires you to have state of previous builds around 🙃

borkdude10:03:52

but this state is already there

mkvlr10:03:35

agree it’s more complicated but can also deliver better results

mkvlr10:03:49

for things going through bababashka there’s even the possibility of tracking things automatically, what files are opened by a task and what hashes they have

borkdude10:03:45

not really, unless you use some low level hacks like instrumenting syscalls, you can do anything with shell

borkdude10:03:17

I think also having something like bb tasks clean could be useful where it wipes every :target thing, so invoking the task again will re-do everything from scratch

borkdude10:03:29

or bb run --clean foo (more in line with what's there now)

borkdude10:03:14

but surely doing something more flexible than timestamps is something to think about

borkdude10:03:59

lowest common denominator would be describing the inputs and target somewhere and then having a "strategy" function of calculating "if newer"

👍 2
borkdude10:03:19

which could be timestamps or hashing and/or flushing some aggregate hash to disk

borkdude10:03:59

and the hash function could be coming from a library in bb.edn too. the default could be pretty unsophisticated, but people can make their own more sophisticated stuff in user space

borkdude10:03:24

:newer-fn nextjournal.dejavu/newer

borkdude10:03:09

(defn newer [{:keys [inputs target]}]
   (slurp previous-hash)
   ...
   (spit current-hash)
   true)

teodorlu13:03:22

I'd probably use this! --

:task (clojure "-T:build uber")
:target "target/uberjar.jar"
:modified-since ["src" "deps.edn"]
:depends [whatever]
I'd say both the files/folders (src/, deps.edn) and other tasks ('whatever) could be considered dependencies. At least, the :modified-since key name doesn't make immediate sense to me. Perhaps:
:task (clojure "-T:build uber")
:target "target/uberjar.jar"
:depends-files ["src" "deps.edn"]
:depends-tasks [whatever]
or
:task (clojure "-T:build uber")
:target "target/uberjar.jar"
:depends {:paths ["src" "deps.edn"] :tasks [whatever]}

borkdude13:03:46

:depends is already a thing in bb tasks, we're not going to rename that to :depends-tasks

teodorlu13:03:17

Yeah, that would be a breaking change, right.

borkdude13:03:35

it already exists, so why change it

teodorlu13:03:05

I don't think this approach would require breakage, though:

:depends {:paths ["src" "deps.edn"] :tasks [whatever]}

borkdude13:03:23

maybe if you specify the :target in task a and task b depends on task a, it's input will be implicitly the target of task a

2
borkdude13:03:26

{a {:produces "foo.jar"}
 b {:depends [a]
    :produces "native-binary"}
When you run b and foo.jar is older than native-binary, you can skip a

👍 2
borkdude13:03:37

maybe :produces is a more descriptive name

teodorlu13:03:21

produces makes slightly more sense in my head than target, a bit more specific.

borkdude13:03:31

this is how I've currently done it (see the fs/modified-since calls): https://github.com/jackrusher/scimacs/blob/main/bb.edn

👀 2
borkdude13:03:52

maybe this is good enough though. the fact that there can be all kinds of edge cases and expectations about this "skipping build" stuff is the reason I initially left it out

teodorlu13:03:15

if anything - https://github.com/jackrusher/scimacs/blob/main/bb.edn looks more of a testiment to the flexibility of bb.edn than its limitations to me. The "getting people started with build caching" use case could be adressed in other ways, for example with docs.

borkdude13:03:54

it's not limited, but doing things more quickly that I've been doing anyway would be nice (to me)

👍 2
borkdude13:03:05

With :produces this would become:

{:tasks
 {:requires ([babashka.fs :as fs])
  :init (do
          (def libsci (str (first (fs/glob "clj" "LibScimacs.{dylib,dll,solib}"))))
          (def lib-ext (fs/extension libsci))
          (def local-lib (str "scimacs." lib-ext))
          (def uberjar "clj/target/uber.jar")
          (def rust-lib (str "target/debug/libscimacs." lib-ext)))
  javac {:doc "Build libscimacs's JVM bytecode"
         :task (clojure
                {:dir "clj"} "-T:build libscimacs")
         :produces uberjar}
  native-image {:doc "Build native-image"
                :depends [javac]
                :produces libsci
                :task (shell
                       {:dir "clj"}
                       "native-image"
                       "-cp" "target/uber.jar"
                       "--shared"
                       "-H:Name=LibScimacs"
                       "--enable-preview"
                       "-H:+ReportExceptionStackTraces"
                       "-J-Dclojure.spec.skip-macros=true"
                       "-J-Dclojure.compiler.direct-linking=true"
                       "--verbose"
                       "--no-fallback"
                       "--no-server"
                       "-J-Xmx3g")}
  rustc {:doc "Build Rust binary"
         :depends [native-image]
         :produces [rust-lib]
         :task (shell "cargo build")}

  all {:doc "Build all"
       :depends [rustc]
       :task (when-not (fs/exists? local-lib)
               (fs/create-sym-link local-lib rust-lib))}}}

👀 2
borkdude13:03:01

This could also help implementing auto-clean which throws away all the "products" first:

bb run --clean all

👍 2
wilkerlucio17:03:15

I like the declarative approach for modified, and I think they have a large scope enough of application to justify the new keys

Peter Tonner12:05:27

I know this is a month-old convo, but just wanted to chime in that I would find this functionality useful for orchestrating data science pipelines that usually follow some flow of`raw data -> processed data -> train model -> generate output` with lots of potential subbranches off this main trajectory. All of these steps would uses the already existing depends but also would be nice to not have to boilerplate all the (when-not (fs/exists? ..)) stuff for each step

borkdude12:05:57

Thanks for the feedback. Would the above proposal fit your needs?

Peter Tonner14:05:37

yea I think the proposal you've outlined would work for me. Would the target and modified-since fields support "inline" calculation? E.g. something like for your libsci example could instead be defined directly in the :produces field? it would mainly be for convenience but I could imagine having many tasks that each have a different produces value that need to be determined programmatically and it might be nicer to have that definition directly inside the task if it's not used anywhere else. definitely not essential though

borkdude09:05:01

> Would the target and modified-since fields support "inline" calculation? You mean, if those fields would be evaluated like the :tasks field?

Max15:03:31

Is stack traces not showing causing code a known issue with httpkit server handlers and with exceptions thrown from the sqlite pod? I can put together a repro if not

borkdude15:03:33

Repro welcome since I have trouble dissecting that long sentence 😅

Max15:03:45

lol sure, 1 sec…

Max15:03:43

(defn app [req]
  (throw (ex-info "Foo" {:bar 1})))

(defonce server (atom nil))

(defn start-server! []
  (when (nil? @server)
    (reset! server (httpkit/run-server #'app {:port 8000}))))

(defn stop-server! []
  (when-not (nil? @server)
    (@server :timeout 100)
    (reset! server nil)))

(comment
  (start-server!)
  ;; make a http request against the server
  )
Here’s the stack trace:
clojure.lang.ExceptionInfo: foo {:bar 1}
        at sci.lang.Var.invoke(lang.cljc:202)
        at sci.impl.analyzer$return_call$reify__4504.eval(analyzer.cljc:1402)
        at sci.impl.analyzer$analyze_throw$reify__4264.eval(analyzer.cljc:968)
        at sci.impl.analyzer$return_do$reify__3927.eval(analyzer.cljc:124)
        at sci.impl.fns$fun$arity_1__1166.invoke(fns.cljc:107)
        at sci.lang.Var.invoke(lang.cljc:200)
        at org.httpkit.server.HttpHandler.run(RingHandler.java:121)
        at [email protected]/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:577)
        at [email protected]/java.util.concurrent.FutureTask.run(FutureTask.java:317)
        at [email protected]/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
        at [email protected]/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
        at [email protected]/java.lang.Thread.run(Thread.java:1589)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine(PlatformThreads.java:775)
        at org.graalvm.nativeimage.builder/com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine(PosixPlatformThreads.java:203)
The actual line that threw the error isn’t in the trace, which makes debugging a little tricky

Max15:03:22

Here’s the other one:

(pods/load-pod 'org.babashka/go-sqlite3 "0.1.0")
(require '[pod.babashka.go-sqlite3 :as sqlite])

(sqlite/query "asdf" "this ain't gonna work")
And the exception:
#error {
 :cause "near \"this\": syntax error"
 :data {}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "near \"this\": syntax error"
   :data {:type :sci/error, :line 1, :column 1, :message "near \"this\": syntax error", :sci.impl/callstack #object[clojure.lang.Volatile 0x44484bc9 {:status :ready, :val ({:line 1, :column 1, :ns #object[sci.lang.Namespace 0x3eb45f36 "rda-visualizer.db"], :file "/Users/maxrothman/repos/rda-visualizer/src/rda_visualizer/db.clj", :sci.impl/f-meta {:name query}} {:line 1, :column 1, :ns #object[sci.lang.Namespace 0x3eb45f36 "rda-visualizer.db"], :file "/Users/maxrothman/repos/rda-visualizer/src/rda_visualizer/db.clj", :sci.impl/f-meta {:name query}})}], :file "/Users/maxrothman/repos/rda-visualizer/src/rda_visualizer/db.clj"}
   :at [sci.impl.utils$rethrow_with_location_of_node invokeStatic "utils.cljc" 129]}
  {:type clojure.lang.ExceptionInfo
   :message "near \"this\": syntax error"
   :data {}
   :at [babashka.pods.impl$processor invokeStatic "impl.clj" 209]}]
 :trace
 [[babashka.pods.impl$processor invokeStatic "impl.clj" 209]
  [babashka.pods.sci$load_pod$fn__27335 invoke "sci.clj" 122]
  [sci.impl.vars$binding_conveyor_fn$fn__440 invoke "vars.cljc" 133]
  [clojure.core$binding_conveyor_fn$fn__5823 invoke "core.clj" 2047]
  [clojure.lang.AFn call "AFn.java" 18]
  [java.util.concurrent.FutureTask run "FutureTask.java" 317]
  [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1144]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 642]
  [java.lang.Thread run "Thread.java" 1589]
  [com.oracle.svm.core.thread.PlatformThreads threadStartRoutine "PlatformThreads.java" 775]
  [com.oracle.svm.core.posix.thread.PosixPlatformThreads pthreadStartRoutine "PosixPlatformThreads.java" 203]]}
This one’s a little better, at least some of the user code is in the exception somewhere, but the trace is similarly unhelpful

borkdude15:03:29

@U01EB0V3H39 Yes, this is a general problem with bb that I will try to improve in the next year

🙏 2
Peter Tonner12:05:27

I know this is a month-old convo, but just wanted to chime in that I would find this functionality useful for orchestrating data science pipelines that usually follow some flow of`raw data -> processed data -> train model -> generate output` with lots of potential subbranches off this main trajectory. All of these steps would uses the already existing depends but also would be nice to not have to boilerplate all the (when-not (fs/exists? ..)) stuff for each step