off-topic 2022-05-27 | Slack Archive

Drew Verlee03:05:56

open invention to anyone who wants to do a read along of Martin's designing data intensive applications. If i get some interest then we can make a slack or discord where we can just discuss idea as we run into them. Read at your own pace. Or if you have a similar book you would like to cover i would be happy to join.

Nitin07:05:59

I would like to join you

folcon01:05:29

Hmm, is it just reading along or would you want to do more than that?

Drew Verlee02:05:06

Just reading. Let me know if you want an invite.

erre lin09:05:27

I would like to join.

folcon09:05:41

@U0DJ4T5U1 drop me one and I'll give it a go :)...

Drew Verlee14:06:12

This invite doesn't expire https://discord.gg/PRgZYFtpHd

Nom Nom Mousse05:05:11

I'm writing a workflow management system like Make/Snakemake in Clojure and I am ready to release an alpha soon. To make a guide for non-scientists I'd love some ideas for a complex workflow that is outside of bioinformatics and perhaps even data science. The point of my system is to make medium-sized and larger workflows easy to write and understand, but the tradeoff is that it is overkill for smaller workflows. If you have some ideas that only use standard command-line tools and perhaps some Python code that would be understandable and perhaps useful to many I'd appreciate any ideas.

Martynas Maciulevičius06:05:37

What is a large workflow? Can't you do it in babashka? Is it some kind of Jenkins multi-stage-build that is displayed somewhere?

Nom Nom Mousse06:05:49

As a scientist, I had large workflows with 100+ rules with complex interactions. Furthermore, babashka does not support using arbitrary wildcards so that each rule can be a template for multiple jobs. This explanation of wildcards gives the basic idea: https://snakemake.readthedocs.io/en/stable/tutorial/basics.html#step-2-generalizing-the-read-mapping-rule Furthermore, my system has live-code reloading, hashing of jobs to keep multiple results of variations of the same job, and a million other things that I can't go into here.

lispyclouds07:05:10

how would it compare to https://titanoboa.io/ ? I managed to use it in a few semi-large cases.

lispyclouds07:05:40

felt like for tools like this, having a nice UI and some sort of infra-as-code serves better than a CLI

Martynas Maciulevičius07:05:50

Can I use titanoboa to create a microservice infrastructure and leave it running? Or is it meant for one-time jobs?

lispyclouds07:05:12

@U028ART884X its pretty much generic, depends on how you've written it. if the first step is a polling/reactive thing, it can be left running to react on events.

Martynas Maciulevičius07:05:43

I couldn't find any tutorials on youtube. It looks promising and allows to create a test instance in the web (which is probably awesome that you can do it) but I don't know what to do with it. Are there any basic guides? OP also told that his framework would be overkill for low-complexity things and would work for large-complexity things. So it may become something similar to titanoboa. I think that there has to be some kind of an example. Or then some kind of a DSL to load the example from. Something similar to docker-compose.yml.

lispyclouds08:05:07

Theres this sort of tutorial/demo video here: https://titanoboa.io/demo.html and the wiki for the steps: https://github.com/commsor/titanoboa/wiki

lispyclouds08:05:05

The DSL is pretty much in edn or using the designer UI

Nom Nom Mousse08:05:08

Thanks for pointing me to Titanoboa. I should write a short comparison in our paper/docs. I do not understand it well yet, though. It would be nice if it had a better guide that only used code, not the web API.

lispyclouds08:05:57

yeah the code based approach docs could be improved. @U5L1P2D9U is available here too i think? 😄

Martynas Maciulevičius08:05:32

Also what MQ does it use? Is it build for speed or for reliability?

lispyclouds08:05:48

Uses RabbitMQ for clustering, probably more for reliability and distributed workloads

Nom Nom Mousse08:05:53

I am not primarily interested in how things are run yet, mostly creating a terse language for writing complex workflows. ATM it only runs jobs on a server, without cluster support. I'll actually use #oz as a backend later to run my workflows.

miro08:05:55

👋 hi folks 🙂 @U0232JK38BZ btw. I played with some larger scale data science stuff in this https://www.youtube.com/watch?v=IRrh4VUHhNY - would love to see the alfa once available to see how you tackle these sort of complex workflows. In regards to complex non-data-science workflows that are out there, in my experience these are usually integration workflows, where you integrate multiple systems together, kinda what you do with zapier - if you look at what workflows are needed in realms of enterprise integration these can get pretty complex. Other group of workflows are e.g. business processes, these are a bit specific since they usually involve also human interaction. Other group is then IT automation workflows (e.g. CI/CD pipelines). At some stage I started to form a "universal workflow theory" 😎 that covered these groups - there is like 80% overlap in requirements and 20% is group specific...

🙏 1

Nom Nom Mousse08:05:43

I'll watch it when I get the time, thanks!

miro08:05:38

thanks for tag @U7ERLH6JX would defo love to hear what use cases you used t-boa for and what were the good and bad things about it 🙂

miro08:05:29

Also quite interesting topic (and unsolved imo) is how to visualize these complex workflows - https://titanoboa.io/visualizing-workflows.html, but there remain certain challenges to that approach too...

lispyclouds09:05:54

@U5L1P2D9U Mostly the usage was orchestrating large volumes of text and generated images around things like Github Actions, S3 and some homerun fat GPU machines to help out of the AI/ML artists make a project of theirs. Nothing really was flexible/general enough for "tying things together". The UI was quite useful for people who havent done much infra work. things that were a bit missed was having a provisioned env and not having to install things and some sort of history of changes if things went bad from the UI changes. Pretty much the things i have in a project of mine https://github.com/bob-cd/bob but it has its own "caveats" too 😅 but loved titanoboa! kudos!

miro10:05:45

Thanks @U7ERLH6JX! Appreciate the feedback! Yeah agree that the history/undo is missing (but shouldnt be that hard to add in the future 😊).

2022-05-27

Channels