Fork me on GitHub
#architecture
<
2023-02-01
>
Alejandro17:02:08

I'm pondering onion architecture aka layered design. I'd like everything in the imperative shell, aka all impure functions, to have explicit list of other impure functions they depend on. Integrant is for resources and initialization in the right order, so it's probably not suitable for this, or is it? Are there options that are specifically for what I'm trying to do? And is there a naming convention for marking impure functions?

seancorfield17:02:34

I'm not sure why you think that wouldn't be suitable? Can you elaborate?

seancorfield17:02:36

(I'm thinking that in such an architecture, initialization of Integrant or similar, would happen in the imperative shell and be used almost entirely within that shell, passing just data -- or perhaps readonly resources -- into the pure core?)

Alejandro18:02:03

I mean, should impure functions be passed to other impure functions through integrant like they are resources?

Alejandro18:02:39

There's a dsl layer, which contains pure and impure functions. I want dependencies of impure functions to be listed explicitly.

seancorfield21:02:18

I can't answer in detail for Integrant because I don't use it. I use Sierra's Component and I really like the simplicity of that (just two lifecycle extension points and system construction is explicit in code). I think the explicitness you want would be satisfied by Component? Or am I misunderstanding what you're asking?

Alejandro05:02:07

I'd like to have something like this, with explicit list of impure functions, but only when they are defined, not when they are used:

(defn get-doc [storage] ...)
(defn process-doc [storage get-doc] ...)
(defn send-doc [storage] ...)
(defn the-whole-workflow [storage get-doc process send-doc]
  ...
  )

;; actual usage somewhere in code
(the-whole-workflow the-storage)

;; and those dependencies get replaced for tests when needed
This is a layer with domain business logic. Both integrant and component are great for dealing with the life cycle of resourses, but what is an idiomatic way of writing this DDD-style code? I found this article, where multimethods and protocols are used for this, which is fine, but I'd like to explore other options as well: https://yogthos.net/posts/2022-12-18-StructuringClojureApplications.html

Alejandro06:02:43

I've just read what I wrote, and if I put storage at the end of args, I could just use partial application to separate defining functions that are used in code and defining functions for testing. Would this scale to large codebases? Or this should be abstracted away? What are the benefits of protocols from the article above compared to partials?

seancorfield06:02:36

I guess I still don't understand the problem statement based on the above code fragment, sorry.

seancorfield06:02:02

My experience has been that most of these techniques don't scale well with application complexity -- they're great for relatively small systems (and so the examples are always compelling because they're simple) but when you have large, complex systems that do a lot of interaction with persistent data stores, you cannot sensibly define a protocol over the necessary interactions because you essentially end up trying to "mock" either SQL or Datalog or something similar.

👍 2
seancorfield06:02:02

If you have a simple KV store and your access pattern is truly hash-map-like then, yeah, this stuff works -- we do this at work for Redis so we can easily write tests around mock Redis-like stores to test a lot of stuff (but of course you're really testing your mocks at that point and if you mock has a bug, you can end up with production bugs because your tests pass but with incorrect assumptions!).

upvote 4
seancorfield06:02:39

We have a huge amount of code at work that needs SQL queries and therefore you need a "real" database, even for testing, and we have many situations where a single API request can insert new records in some tables and the generated keys are needed to update other tables with foreign keys etc. We've tried to tease a lot of this apart and separate queries from logic from updates -- and the code ends up being very monadic in style and much harder to read, maintain, and even reason about 😞

Alejandro07:02:06

Well, yeah, integration tests are important. So, in your experience unit testing is not worth it?

seancorfield07:02:05

Oh, we have a lot of unit tests as well. We have about 135k lines of code in our monorepo, using Polylith, with about 140 separate components.

Rupert (All Street)09:02:29

Integrant (and the many other dependency injection libraries available) can be used for injecting impure functions into components. This then allows you to inject alternative/dummy functions during testing instead. The multimethods/protocols can just be one liners calling out to your regular defn function code. You don't have to implement your business logic in them.

Alejandro10:02:01

Ok. Integrant and component both help with, well, components. If I write DDD-style code, should my impure functions be components as well? This style implies listing impure functions an impure function depends on. Usually done for domain specific layers.

Alejandro12:02:28

Here's an example of DDD-style list of dependencies:

(defn check-login-ok
  ([form] (check-login-ok form get-managed-users get-current-manager))
  ([form get-managed-users-fn get-current-manager-fn] ...))

thom21:02:53

I have created a lot of semi-functional, semi-imperative spaghetti in the past but these days I use the protocols and Integrant approach a fair bit and find it fairly easy to reason about. Sets of functions often operate on the same sort of resource or lifecycle so I’d rather group them than manage them individually. Depends on the articulation points in your app and the resulting cognitive overhead. It’s very nice to be able to easily swap in mock or null versions both in the REPL and in tests. Sean is right at a certain point that doesn’t scale, but you can also just boot up an embedded DB or an in-proc GraphQL server for the real components to talk to if you want (I have an embedded Postgres component that returns a map containing a DB spec, which you just plug into your DB component instead of its usual config and off you go). You can then run functional tests over a fairly representative system, create smaller more targeted sets of components for other more in depth integration tests, and any logic not covered by that you can unit test. For very algorithmic stuff with many edge cases I try to do property tests, and value integration and functional tests more highly for everything else.

seancorfield21:02:31

☝️:skin-tone-2: This sounds like the pragmatic combination of things we do. We have some things behind protocols, where they have a "reasonable" API, we have local test DBs (MySQL, Redis, Elastic Search) with appropriate setup/teardown where we can't just mock something, we have property tests where they make sense (and also some generative tests the produce "command models" that we can programmatically execute against the code to verify known outcomes at the integration level). I've wrestled with solutions for the more complex workflows we have where a single "operation" needs to write multiple things to storage where the output of one step includes generated IDs that get used in subsequent steps and haven't yet come up with decent solutions that are both maintainable and clearly separate pure and impure code (by the time you've added in error handling as well). This seems to be one of those areas where the tradeoffs all have equally good and bad aspects 😐

Alejandro06:02:29

@UTF99QP7V, thanks for describing the whole workflow with integrant, some high-level overview is helpful, information is rather sparse.

Alejandro06:02:31

@U04V70XH6, apropos the last part about complex interactions, I'm reading a follow-up article to the "dependency rejection" series: https://blog.ploeh.dk/2019/12/02/refactoring-registration-flow-to-functional-architecture/, but I'm not sure this can be an everyday thing. What would you say about test doubles, which is essentially instrumenting complex functions for logging their interactions? I'm talking about this library: https://github.com/alexanderjamesking/spy

seancorfield06:02:56

An article like that is definitely fascinating -- and I'm sure it's true, per the author, that you can always refactor any workflow to that structure... but it says nothing of the effort involved and whether you should do it, only that it is possible. Do you think that the code in that article is maintainable? Do you think the end result is more maintainable than the first version? Do you think the intellectual effort is worthwhile (given that this is just one small method)?

seancorfield06:02:45

As for spy/test doubles, I find that I very rarely need to resort to that sort of thing. I use Expectations extensively and it has a side-effects macro that you can use to easily create test doubles but I hardly ever use it: and I consider the need for such a thing to be a code smell, to be honest, and would rather refactor the code in most cases.

Alejandro06:02:26

@U04V70XH6, ok, got it, thanks

kraf20:02:57

> Oh, we have a lot of unit tests as well. We have about 135k lines of code in our monorepo, using Polylith, with about 140 separate components. @U04V70XH6 May I ask you to elaborate a bit more on the kinds of components you have? Do you have separations across domain lines at all? I've been long time wondering about an approach that's kind of like doing microservices but in a monolith. Shopify has a gigantic Rails monorepo and I think this is what they are doing now but it's hard to find details. At the moment we have handlers that read from and write to all kinds of different tables that "belong" to other workflows/processes/domains. I know exactly what you mean by those ideas looking very nice for toy examples but falling behind in real products at scale. We've all seen it not work. But on the other hand there is just so much coupling just due to the fact that all those handlers know this one global mutable thing, the SQL database. They all know the whole schema and all the implicit assumptions. Are you doing something to mitigate this or does it just keep working?

seancorfield21:02:23

@U01DV4FGYJ0 We have components for all sorts of things and we've mostly learned to keep them as small and focused as possible. Yes, we have multiple components within a domain -- often based on what aspect of the domain they're dealing with and/or whether they are pure or impure components and/or how widely-used specific functionality is within the domain -- a lot of those decisions are driven by how much impact a change will have: if I have parts that I expect to change frequently, I want those used by as few components as possible to avoid "test-the-world" CI runs. I talked about some of the benefits of Polylith in terms of naming and code organization in https://corfield.org/blog/2021/06/06/deps-edn-monorepo-3/

🙏 2
kraf09:02:27

And do you impose certain restrictions on your components? Is every component allowed to access every table or do you try to only let stuff happen through public APIs based on ids? I vaguely remember an old thread where you mentioned you were calling honeysql directly from you handlers if I understood this correctly. Did you change this approach? It would be tremendously helpful if you were willing to share a few examples of components. I mean just more or less their names

seancorfield17:02:40

We don't impose restrictions beyond what Polylith itself requires (access only via .interface). We don't encapsulate our database access because it really wouldn't be practical -- see comments earlier in this thread about pragmatic testing: we have four MySQL databases with over 300 tables so trying to encapsulate all that in code would cause a massive bloat in our system that just isn't worth the "purity" tradeoff. We do have a few places where we've created a ws.*-data component and use that for "all" related data access but even there it is sometimes better to do direct next.jdbc access than add another component dependency (see "pragmatic" considerations in my previous comment). Would I love to have a pure core and imperative shell with all the side-effect-y database stuff off at the edges? Yeah, that might make certain things easier. Do I think it's practical in a system our size with such a massive dependence on a SQL database? Not really, at least not in terms of managing the extra code it would require and the extra dependencies. Right now, we can manage all of this with just two Clojure devs. If we had a bigger team, I'd probably want more separation and I'd need more devs just to manage the increase in code size that would create.

kraf19:02:32

Thank you for your insights! I also just discoveredhttps://polylith.gitbook.io/polylith/conclusion/production-systems on the Polylith page itself, that is super interesting. I'm surprised that you are only two devs 😄 You make a good point and honestly this is the tradeoff we're making as well. I'm just trying to look ahead and wondering how well it will scale. Is your database split roughly along domain lines? Do you have components that can access multiple databases directly or do you aim to do this only through other components?

seancorfield19:02:14

We have a datasource for each database that has r/w privilege on just that database. Then we have a "reporting" datasource that has readonly privilege on all databases. We use Component to manage datasources and certain apps have more than one r/w datasource if they need to modify more than one database -- but we try to keep the components to just one datasource where possible.

👍 2
🙏 2