Fork me on GitHub
#babashka
<
2022-01-16
>
Nom Nom Mousse14:01:05

Hi @borkdude You helped me create processes with Babashka earlier to start bash jobs. You even showed me how I could add a callback function to these processes which tells me when the bash jobs are complete and process the results of the bash job. My problem is now: these callback functions seem to continue running in the same thread started for the bash process. So in the end lots of futures are created and my regular execution happens in the futures too. Do you know a workaround for this? I.e. having the callback function pass over execution to the main thread instead of continuing in the new thread?

borkdude14:01:06

Can you remind me of what we did the previous time?

borkdude14:01:14

The code I mean, an example

Nom Nom Mousse14:01:50

Sorry, yes of course. I thought this was a standard pattern

Nom Nom Mousse14:01:04

Most of these additions are my own, but this should give you the gist

Nom Nom Mousse14:01:42

(I added the snippet in a new post since it was hard to read in the replies sidebar)

borkdude14:01:41

I can read it in the thread just fine, you can click on threads on the left side

👍 1
borkdude14:01:47

and then you see the conversation in full screen

borkdude14:01:51

let's continue the conversation here

Nom Nom Mousse14:01:41

Ill move it again

Nom Nom Mousse14:01:06

I think the problem is that everything that happens as a result of

(reset! app-state/jobresult [jobid (.exitValue p) job])))))
happens in the new thread. This is where I'd really like to pass control over to my normal program. I understand that his might not be something that babashka can help with though :)

borkdude14:01:34

To have the exitValue you need to wait for the process to finish, so then you could just not use onExit at all, and just use waitFor

borkdude14:01:56

or in babashka.process terms (-> (process ...) deref :exit)

Nom Nom Mousse15:01:06

Thanks. But will the bash job be blocking then? Or can I run multiple bash jobs at the same time?

borkdude15:01:01

process isn't blocking by default, unless you block (deref) yourself. can you explain in words what you are trying to accomplish again?

Nom Nom Mousse15:01:30

Sorry. I want to run bash jobs, possibly multiple, often in parallel. When they finish I want to process the result and update my DAG and start the next jobs ready for execution.

borkdude15:01:50

yeah ok, so you could have three processes, start them in parallel with (def p1 (process ["bash"]) .. (def p3 ..) then at the end you can collect the results by just walking over the processes with deref, to wait for all them to finish?`(mapv deref [p1 p2 p3])`

borkdude15:01:52

if you want to see the exit value of the first returned one, you could use promises or core.async in combination with .onExit

borkdude15:01:13

but if you want the combined result of all of them, then it doesn't matter, you have to wait for all of them anyway

Nom Nom Mousse15:01:39

I see that I have lots to learn and that this might not be babashka related, but I appreciate your help :)

borkdude15:01:06

or you could update an atom in .onExit :

(def results (atom []))
(.onExit (reify Function (swap! results ..))

borkdude15:01:12

(pseudocode)

Nom Nom Mousse15:01:33

The problem is that I am trying to create something like Make but suited to my needs. The program does not know ahead of time which jobs will be finished when so I need to have a program running that dispatches jobs one by one and collects their results and sees what new jobs can be run.

borkdude15:01:59

This is what babashka tasks already does :)

borkdude15:01:42

See --parallel or :parallel

Nom Nom Mousse15:01:47

Your solution is what I have tried: (reset! app-state/jobresult [jobid (.exitValue p) job]) But the code started by the watcher of app-state/jobresult seems to run in the future

borkdude15:01:00

This is already a Makefile replacement, so why make your own if you can use that :)

borkdude15:01:43

yes. if you don't want to run it in the future, you need to block, obviously

Nom Nom Mousse15:01:20

Okay, this is my first foray into concurrency. But now I know the limitations. Thanks!

Nom Nom Mousse15:01:47

I wasn't certain, but now I know

borkdude15:01:47

There are only two possibilities: either you run something async and handle the result async, or you block and do something with the result on the main thread.

Nom Nom Mousse15:01:51

I should really look at using babashka tasks as a submodule then. I'm writing something more involved (look up nextflow or snakemake if interested) but I'm all for code-reuse. It seems like babashka/tasks has the ability to call functions upon starting a job and finishing it. This would allow me to run arbitrarily complex code around the workflow...

borkdude15:01:47

you can look inside the code for babashka how it's done. it uses core.async

Nom Nom Mousse15:01:54

Thanks for the info. I'll start to learn about babashka tasks and think about how I can use it as a task runner within a larger program 😄

borkdude16:01:03

I should not have used the word obviously, sorry, concurrency is hard :)

Nom Nom Mousse15:01:27

I want to run babashka tasks to run programs in parallel. These often write output to screen that I want to show. However, I want to tell which program wrote which lines to screen (and perhaps whether these were directed to stdin or stdout). Is this possible in babashka tasks? So that instead of seeing

<output from program 1>
<output from program 3>
...
I could have babashka explain who wrote what? Like:
(program 1, time, stderr): output
(program 3, time, stdout): output
Where (program 1, time, stderr) is user-configurable. If this is not possible, is this something you would consider adding?

borkdude15:01:03

Perhaps worth a try. You can open a Github Discussion about this idea so we can maybe discuss it further and others can respond there/upvote the idea. https://github.com/babashka/babashka/discussions

Nom Nom Mousse17:01:10

Come to think of it, all that is needed is to have an option to send the stdout/stderr to tap, and then it could be handled there.

borkdude17:01:55

If you're shelling out, perhaps there is also a unix tool that you can filter output through and which prepends something to it, for now

Nom Nom Mousse15:01:24

Also a short question about the docs: > The `current-task` function returns a map representing the currently running task. This function is typically used in the `:enter` and `:leave` hooks. What does it return when multiple jobs are running?

borkdude15:01:05

in a task, it's always returns the current task. there is always one current task, just like there is always one "the current thread"

Nom Nom Mousse16:01:38

I thought multiple tasks could be run in parallel with the parallel flag

borkdude18:01:19

this is true, but current-task is intended to be called in :enter or :leave or a function that runs in certain task, so in that sense, there is always one

👍 1