Fork me on GitHub

I always wonder, with Jan Stepien's approach (in that video) -- which I've seen advocated a few times elsewhere -- whether it scales to anything approaching a complex app? I mean, you need lots of protocols (one for each abstraction that the use cases need injected from the adapter layer) and those protocols need a method for every interaction with those abstractions. In a system that has several dozen entities, all needing some sort of CRUD operation, that's going to be a lot of boilerplate.


You are right, when the system grows you would have more protocols and etc. But that’s good because that’s the architecture. It make the architecture explicits, communicates the entities and relationships etc. It also separates the domain from the infrastructure, because the boundaries are explicit. I have seen many Clojure projects where the presentation layer was tight together with database. So the system was hard to maintain and understand (classic scenario what happens with Rails projects).


I dislike seeing protocols that have only a single implementation, which that sort of architecture leads to. Using protocols simply so you can mock components is a bad choice, IMO.


OK how would you separate domain logic and side effect implementation?


Generally by having the domain logic return a description of changes it needs made to the "system" and an orchestration layer that calls the domain logic and then calls the side effecting code. Overall tho', I am not much of a "purist" when separating some of those things out and having DB inserts/updates in the middle of a chain of business logic doesn't bother me as much as it bothers some point.


Most of the proposed separations work fine in-the-small but really don't scale in-the-large -- and I'm working with 105K lines of Clojure that spans a decade of evolution of both the functionality and the skill level of the team members that have worked on it across that time period.


We started that journey by embedding Clojure in legacy apps and, specifically, using Clojure for JDBC stuff -- so our Clojure code was mostly side-effecting library code at first 🙂


I agree that you can use same approach as Re-frame with Interceptors. But at the end it adds another abstraction layer too. The separation of the domain and side effects leads to better architecture and maintenance. It’s also easier when onboarding new people to the project. Yeah I pretty sure that making this change on legacy project is a huge amount of work. A quality of my Clojure code in last 5 years looks very different, because my experience how to make things better evolve during the time.


By the time you have a system with 100 entities, all persisting to 100 tables potentially, you're going to have 100's of protocol functions and all of their implementations, and then for any mocking you do for testing, you have to reimplement all of those against whatever mock system you use...

Drew Verlee02:11:49

I'm only a couple minutes in, but the issue I see is conflating how humans distinguish versus how a machine distinguishes. The word book, is not a book. To a machine its just a string. if you have N categories which are incidentally functionality treated differently within your system then everything you do in that system is made incidentally more complex but a factor of N. I think over modeling on the business domain is a real danger. It's very easy to latch on to an easy problem, how are "books" different then "students" and did m avoid talking about the mountain of inhuman issues that really slow down progress.

Drew Verlee02:11:03

But it's so contextually it defies offering generic advice. Which plays into the fear that often motivates a lot of discussion around protective coding. People want to know what wont get them into trouble and they substitute solutions to that in the small with the larger question of how to progress.


I can't tell whether you're agreeing with me or disagreeing @drewverlee?

Drew Verlee03:11:29

Agreeing. I'm falling asleep over here so I'm probably not explaining myself very well.


OK, cool. Was a bit hard to follow.

Drew Verlee13:11:20

The talk is called "introducing structure". So lets talk in terms of structure and think about structure and it's opposing force unstructured/flexibility. To increase one, we must necessarily decrease the other. Even if that's not the definition traditionally, we need to define it precisely if we have any hope of communicating. Highlighting one of several places where the presenter makes such a trade off:

(extend-protocol book-table/SaveBooking 
    (save-booking [postgres booking] 
        (execute-sql postgres
               "insert into bookings...."

Drew Verlee13:11:20

How might talk about this in terms of structure or flexibility? Counter to the talks title, we have now decreased structure. Before saveBooking always threw an error "no implementation" now it might do more. The system as a whole is now more flexible. Now can we judge this choice? From what measure? Lets pretend for a moment that the presenter is the both developer and business owner. And that he is in the business of just saving bookings and business is booming. His family has done this for generations and they have a huge amount of control of the save bookings industry. Under that context, this is an excellent choice. The structure embodies the goal, it protects it, it sits on it like a dragon on a mountain of gold. So, when you see this, you naturally asked what if we had to do more the save-bookings (e.g unsave-bookings)? To which our hypothetical dragon would promptly exert his wrath upon you. The dragon knows his rule lies in focusing on his position of power and not letting others distract him from it. Under a different context however, one where our protagonist wasn't a mighty dragon but a young ignorant traveler, it would be very unclear if building a temple to save-bookings was worth the time. To be more precise, given his knowledge of the goal is unclear any time spent on imposing structure or flexibility is at risk of being a trade off that isn't worth making. So what is he to do? To move forward he must do something! So he picks a path and bodily sets out. Wisdom precludes boldness. The correct path forward is one that stares not only into the distance (to what end you only vaguely no where your going), but also takes small correct steps in its directions. Coming back down to the software domain, the idea here is reasonable in only a really really unlikely context. Given any system i have ever worked on, its a slightly too big of a step as it assumes structure around a specific part of a business domain is important. As you point out, this would lead to a protocol function for each table. Which would imply that the structure/flexibility is useless, you keep having to build more so the original one obviously didn't help... The fact that it becomes unclear when to say something increased structure vs flexibility points to an underlying issue. One that I can't articulate (as evidenced by my rantings last night). The misalignment is partially between human and machine. We are wired to get ques from humans, not machines, but as developers we talk to machines far more. If you approach the problem from what a human wants and forget what the machines want its easy to end up with extravegantly complex models that only you can understand. You become a dragon on a large mountain with no gold inside it. So what machine must care about books? Well, postgres. Only when talking to postgres then must you care about books. Who else might care about books? Well your users via the UI. So then.. curious. Now are business domain seems to be on the outer part of our circle. The ying/yang and oop/fp So then we need a word which captures both. I like to think of it in terms of composability. Will our sql queries need to compose? What is the price if we assume they do? What if we write code that implies they don't? Does this composability obscure meaning? Or can we easily extract the meaning through evaluation? Thanks for reading, hopefully you get as much out of it as i did writing it. 🙂


I have been tasked with making a facsimile viewer for the web. The viewer itself is a reagent single-page application that takes a list of facsimile (images from scanned source materials) and transcriptions of these facsimile in an XML file. The frontend SPA then converts the XML into hiccup and puts it inside a reagent component together with the facsimile to navigate between different pages. What would be optimal API/database combination for an API that should basically just be a datastore for several hundred GBs worth of images (scanned letters) and XML files (corresponding transcriptions of the scanned letters). They need to be searchable based on the contents of the XML files. I am currently unsure if it even makes sense to put the XML documents themselves in a database or if it’s a better idea to extract some metadata from them, put this in a database, and leave the documents on disk. What do you think? In case it makes any difference: I also need to make it possible for users to associate comments with specific elements in the XML documents and put these in some sort of queue until approved by a curator. When retrieving the XML documents I will need to retrieve the list of approved comments too. I was thinking the database should support this. As for the API, I am putting it inside a Pedestal web service which also serves my single-page application made with reagent. The SPA itself is contained in the index.html page and bundled JS/CSS files, but what about the API endpoint(s)? Should I go with something RESTful or does it make sense to go with EQL, GraphQL, some other solution? In the past I’ve made a transit-based API endpoint with my own custom protocol, but I wouldn’t mind standardising on something. So what kind of API/database combo makes sense for me? Sorry for the wall of text. I am just looking for suggestions.


TL;DR - I Need to make an API exposing a datastore that * contains thousands of images and XML files * is searchable by relevant metadata found in the XML files * supports associating comments with specific elements in the XML files (note: not XML comments, just comments like you would find on a blog) What kind of API should I make and what database should I use?

Drew Verlee14:11:34

@simongray > What would be optimal API/database combination • a simple one layered key value store (s3, google bucket) for the images. > They need to be searchable based on the contents of the XML files. Define searchable and you have your answer for what you need for the second database

Drew Verlee14:11:47

Do you mean, you want a word match? does searchable mean the system understands what you meant similar to how a human would?

Drew Verlee14:11:11

If the XML has fields which are searchable then you should put those in a database with a rich query language (postgres / datomic)

Drew Verlee14:11:39

From there you can, depending on your read write requirements either put the transcirptions in its own database (speed for space) or just do the search in memory (space for speed).

Drew Verlee14:11:27

If your very unsure where the whole thing is going then don't use any database to start, just write a really obvious program and save the files on a filesystem? I take no personal responsibility for how that turns out 😆

😂 3

@drewverlee searchable just means that a list of documents need to be retrieved based on some filters. The XML documents contain a header element containing some metadata which will definitely need to be part of the search interface, but the actual textual content itself will probably also need to be.

Drew Verlee14:11:33

search implies filter, you have to say what the filter critera is. That defines the functionality. e.g if you can only need to support an exact string match on a XML field (the header?) then you can store that in postgres and easily query directly by it. Postgres also likely supports extensions that can do more then an exact string match. But the sky is the limit, Google Search takes into account my age and location when i search.


Well, various criteria. Some should be exact string matches (e.g. filter documents with a specific author) while some are dates or numeric (e.g. filter documents written between 1933 and 1948).

Drew Verlee14:11:20

Yea, those should be handled by a database with a schema. If your not sure what performance characteristics any unstructured search will need, or if the exist at all, then your best bet is to do as little as possible and see what people need and ask for.


Ok. So postgres for storing metadata, but access the files through the filesystem?

Drew Verlee14:11:54

I'm using the term file system fairly loosely. Unless you plan on support searching by images e.g give me all the cats based soley on the information in the picture and not a text label. Then what you need from a query perspective is just a key (the file name) and its value (the image). So anything that can do an acceptable fast lookup (i dont know what that means here) and is cheap enough (again, no idea) will work.


Ok, that makes sense. I know that postgres supports adding XML as a datatype, but I am unsure of the benefits. My first hunch was also just to keep the files on disk and simply associate their file paths with some metadata in the database.


The product I'm working on is in large part what you're describing (with differences of course: we store audio, video and import written content from various sources such as Zendesk, Google Drive, Intercom and more). All static assets are in S3, Postgres stores content in our own format (jsonb in PG) and all of the metadata and content is indexed in Elasticsearch.

Drew Verlee14:11:06

Yep, those are resonable modern solutions. Its potentially save to bang something out in clj and store some files in something like s3 with replication until you define how much you need postgres and elastic search though.


Maybe I should mention that this is meant to be used by a relatively small userbase with very few concurrent users and I expect everything to be running on a single machine.


It’s a research project, so only highly specialised researchers will have access to it.


Ah, no need for ES then :-)


that’s what I thought 🙂

Drew Verlee14:11:46

Oh then yea. Just bang something out in an hour in pure clj and evolve overtime.


PG's full text search is going to be plenty


I believe PG has a special XML data type as well, so it might also support some interesting query patterns a'la jsonb


So potentially you could squeeze all of that there with very few moving parts

Drew Verlee14:11:47

You can likely get away from even using Posgres though. Just make sure you have some way to replicate the data so the chances of losing it are really low.


thanks for the input guys. 🙂

Drew Verlee14:11:35

use something like datascript and just recompute the index every time you search.

Drew Verlee14:11:51

if its to slow then move to a database.


@drewverlee I think Datascript would be fine if I only had to serve the XML, but I also need persistence for the comments.

Drew Verlee15:11:53

persist the xml file and the image, but you can just read all of them at search time. If its like 20 xml files then downloading them everytime won't be that big of a deal. And its something you can finish in an hour.


@jon920 Its been 2 days, but to add on to what you are thinking


if you want "encapsulation" in clojure you aren't going to get it, but you can easily make a "boundary" for a system where users are meant to use public functions to access and work with things


and not access map keys directly


the easiest way to signal something like this would be to use namespaced keys


in the same way this python


class Apple:
    def __init__(self, color):
        self.__color = color
    def color(self):
        return self.__color


interprets the __color field as _Apple_color


and thus it is clear that outside the definition of that class, it would be a paux fas to read or modify that field directly


you can use namespaced keywords in clojure to achieve the same effect


(ns my.ns)

(defn create-apple [color]
  {::color color})

(defn color [apple]
  (::color apple))


where ::color expands to :my.ns/color


so it is a signal to other namespaces only to mess with that key if it is documented how to do so


the value of doing that for everything is kinda questionable - especially when it is just data


and it all falls under "techniques that only work if everyone agrees to them"


but it is at least a way to make some things "private"


@simongray Is it only thousands of XML files?


kinda a dumb approach, but you can just buffer that junk in memory and do straight filters


then access corresponding images in s3


@emccue Thanks good advice! I was reading my DDD book more today and came across a section on functional programming. It says that “the anemic domain model pattern is actually a fundamentally useful concept when using functional programming as opposed to being an anti-pattern … the most important domain concepts are verbs — not the nouns like a bank account, but the verbs like transferring funds. With functional programming and the anemic model, you still have the power to fully express domain verbs, and consequently to have meaningful conversations with domain experts… when building functional domain models, it is still possible to have structures that represent domain concepts, even when using the anemic domain model pattern. Significantly though they are just data structures with no behavior-- so a behavior-rich, object-oriented BankAccount entity (with Deposit() and IncreaseOverdraft()) would be modeled only as a pure immutable data structure (shows a struct without those methods). Having reduced objects into pure data structures, behavior then exists as pure functions… challenge is to cohesively group and combine them aligned with the conceptual domain model. One effective option is to group functions into aggregates” So it sounds like an “anemic” domain model where you have dumb structs and put all of the domain logic into domain layer service modules could be the way to go with FP. Then like you say it’s a faux pax to modify the structs directly, it should be done through these domain service (verb) modules


My eyes glaze over whenever i hear the word domain, but yeah


And the aggregates… which I haven’t figured out yet but I’ll get there


Here is kinda sorta and example from a project i am working on


;; ----------------------------------------------------------------------------
(defn by-id [db id]
  (jdbc/execute-one! db ["SELECT * FROM post WHERE id=?" id]))

;; ----------------------------------------------------------------------------
(defn created-by
  "Returns the user the post was created by."
  [db post]
  (jdbc/execute-one! db
                     ["SELECT * FROM post
                       INNER JOIN page ON post.page_id =
                       INNER JOIN \"user\" ON page.user_id = \"user\".id
                       WHERE = ?
                       LIMIT 1"
                      (:post/id post)]))

;; ----------------------------------------------------------------------------
(defn can-access?
  [db post user]
    (not (get-in post [:post/content :options :hidden]))
    (= (:user/id (created-by db post))
       (:user/id user))))

;; ----------------------------------------------------------------------------
(defn reaction-counts
  "Returns a list of reactions and their counts - non-nil.
  Each element is
    :token - the emoji react
    :count - the number of reactions of that kind."
  [db post]
  (let [reaction-counts (jdbc/execute!
                          ["SELECT token, COUNT(*) FROM reaction
                            WHERE post_id = ?
                            GROUP BY token"
                           (:post/id post)])]
      #(utils/rename-key % :reaction/token :token)


I have a post namespace that has functions that deal with posts directly


and all the keywords are namespaced with their table name in sql


but i'm not treating those as "private" things even though they are namespaced keywords


other parts of the code can see :post/id if they want to


in the same way this namespace feels free to look into :user/id


But when I have a more complicated, stateful system


;; ----------------------------------------------------------------------------
(defn create-chat-subsystem
  "Creates an object that holds the info required to manage
  the chat subsystem, including sending notifications to
  users when messages are sent."
  [db ^JedisPool redis-client-pool]
  (let [;; Map of user-id to callbacks to call when a
        ;; new message comes through for them.
        connections (atom {})
        subsystem {::connections connections
                   ;; Objects needed to manage subscribing to redis
                   ;; for messages posted on other nodes.
                   ::redis-client (.getResource redis-client-pool)
                   ::redis-pub-sub (chat-messages-listener db connections)
                   ::subscription-executor (Executors/newSingleThreadExecutor
                                             (-> (BasicThreadFactory$Builder.)
                                                 (.namingPattern "chat-subsystem-%s")
    (.submit (::subscription-executor subsystem)
             (reify Runnable
               (run [_]
                   (.psubscribe (::redis-client subsystem)
                                (::redis-pub-sub subsystem)
                                (into-array [(message-key "*" "*")]))))))

;; ----------------------------------------------------------------------------
(defn shutdown-chat-subsystem! [chat-subsystem]
  (log/info ::shutdown-step "Unsubscribing from channels.")
  (.punsubscribe (::redis-pub-sub chat-subsystem))

  (log/info ::shutdown-step "Returning redis client to the pool.")
  (.close (::redis-client chat-subsystem))

  (log/info ::shutdown-step "Shutting down the executor")
  (.shutdownNow (::subscription-executor chat-subsystem)))


I use keywords namespaced with the full namespace in the code


and I do treat those as private


;; ----------------------------------------------------------------------------
(defn attach-user-session! [chat-subsystem user-id callback]
  (swap! (::connections chat-subsystem)
         (fn [users]
           (update users user-id conj callback))))

;; ----------------------------------------------------------------------------
(defn remove-user-session!
  [chat-subsystem user-id callback]
  (swap! (::connections chat-subsystem)
         (fn [users]
           (let [new-callbacks-for-user (remove #{callback} (users user-id))]
             (if (empty? new-callbacks-for-user)
               (dissoc users user-id)
               (assoc users user-id new-callbacks-for-user))))))


and then the public functions in the namespace are my contract for interaction


but if there were invariants between fields in data, like a bank account, being less transparent would also make sense


all of which is to say I am lightyears from a consistent opinion


maybe I should have


(defn id [user]
  (:user/id user))

;; or (def id :user/id) - that would work too since keywords are fns


and use (user/id ...) in other namespaces


Nice I like that technique


This book also mentioned a “memento” pattern for encapsulation when doing functional C#, where you have a function that returns a “memento” struct which is a version of the original domain struct but with all of the private attributes filtered out. Though they mention that as more of an option for passing up to the UI rather than a tool for protecting invariants inside the domain. So I think your technique would be better for that


Many OOP patterns just disappear in FP: immutable values and (higher order) functions mean that a lot of those OOP patterns aren't needed because the inherent complexity just isn't there in FP.


(there are some patterns in FP that have no equivalence in OOP as well)


Its good to think about "encapsulation" for what it gives you, and not just for the sake of it. Like why do you want to encapsulate things to begin with? Once you answer that, you can more easily think... Ok how can I get this property in Clojure? Do I need encapsulation? Can other thing give me the same property?

👍 3

Yeah, I don't like to see the equivalent of "getters" in Clojure code unless they add specific value over just accessing data fields directly.


This is pretty old, but shows how most OOP patterns are just unnecessary in Clojure:


(at work we have a code base that stretches back over a decade and some areas have "getters" because we want to hide the fact that some parts of the code traffic in lowercase keys, some parts in camelCase keys, and some parts in qualified/keys as we refactor and modernize the code)


Ah, Pedro and Eve -- that's a fun blog post!


facepalm Only just now I got the names of people in that article


Well the encapsulation gets you guarantees on invariants so that you don’t end up with a FlightBooking with a null departure date or a BarCustomer with an age under 21. In a statically typed language a lot of that can be done with the type system like in F#. Otherwise I’m thinking of performing data validation during runtime at the boundaries of the domain when outer layers are accessing the data, with something like Clojure spec. So you could still violate the invariants from within the domain layer, but it won’t escape outside of the system as long as you validate it at every point of egress


"every point of egress" is an overkill, you validate where it matters


@jon920 Ok, so what you want are to assert your data invariants? Then its not "encapsulation" per say that you want. So now the question is, How, in Clojure, can you assert data invariants and protect yourself against data modification that would corrupt your data invariant?


In my opinion, like you mentioned, Spec is the way to go.


Sure spec can work, I just have to (ideally) ensure that the data is validated at every point before it can be persisted or used to perform a calculation


With OOP + encapsulation or a type system enforcing those invariants, if you have a class/data type that guarantees those invariants, you can persist or use it without having to check. Without that it seems like I’d have to identity every place where that could happen and run the validations. Maybe using macros?


Sorry I have to eat but I’ll check back, thanks for all the great advice so far


Ya, its not that much effort. In my opinion, its much better than data encapsulation. With data encapsulation, you only have a "best intention" protection. In that you hope that all devs who will change the functions that are allowed to change the data to know what all the invariants should be, and not mess up their code change in a way that would break it. With Spec you get a formal language to define the invariants, and automatically validate them, so even if a dev mess up, it'll be caught, and your prod data won't be corrupted.


And, you can still provide some "relaxed form" of encapsulation. Like say you have a Domain Aggregate, have some namespace for it. Put in that namespace the functions that are supposed to modify the entities and value types of the aggregate. And use that namespace everywhere. Now sure, your colleague could decide: Screw Jon and his stupid abstraction, I'm going to directly call the DB without going through his domain aggregate. And you know what, they can do that in Haskell or Scala or Java or F# just as well. At this point, have some team standards, hope the CR catches things, make it obvious that this data should be changed by functions in that namespace, etc. Same thing if your data is not in a DB, if its just in-memory, make the variable private to that namespace, or make it obvious what the right way to modify it is. You can try to add all kind of extra "pain" for someone to bypass your guards, but your colleague always has a way to do it, since they control the source, they can just change the guard's source code as well, or any other shenanigans.


> And, you can still provide some “relaxed form” of encapsulation. Like say you have a Domain Aggregate, have some namespace for it. Put in that namespace the functions that are supposed to modify the entities and value types of the aggregate. And use that namespace everywhere. > make it obvious that this data should be changed by functions in that namespace I like those ideas, that is what I’m leading toward right now


Also, might have felt I was bashing OO encapsulation a bit, I'm not. Just trying to show the trade-offs of some of the alternatives Clojure has actually have pros/cons. I don't feel its clear that one is always better than the other.


What Clojure does though, which is WAYY better than Java, is by having it all immutable, it will be hard to accidentally break the data due to inadvertent mutation.


Oh, and one last point, while static types can assert some invariant (and its cool they do it statically), in practice, its just not powerful enough to truly protect my data from corruption. So I still find Spec is a great tradeoff here, yes you move the validation to runtime, but you can be much more sure that your invariants are held, since you can model them much more precisely.


I think there's also a lot of nervousness from folks who come from a statically typed background, especially with OO languages, where they're used to the type system and enforced encapsulation preventing a lot of mistakes that would otherwise be easy to make with mutable data -- and there's a temptation for them to view Spec as a "replacement" for a "type system" (it isn't) and to overuse it so they have a "type signature" on lots of their functions and they add it lots of s/valid? calls in places where they wouldn't validate data in their "home" language. That Clojure is so very, very different to that world takes some folks a long time to accept and really internalize.

👍 3

For sure. The learning to ride a bike analogy applies well here. It'll take someone a while after they take off the training wheels to not be scared they're going to fall and hurt themselves. They need to gain more confidence that... oh ya, they don't actually fall down anymore, and ya, those training wheels were actually useless at this point.


I see it a lot... But what about type errors? How will I not make them! And its like... Relax, you're not going to push code to prod that have type errors, you're smart, you'll catch them at the REPL or in your tests, don't worry about it.


(and part of that security comes from developing a good REPL workflow, which can also take a while since it is so very different from how you work in other languages)


I also like how the trade offs for the bike analogy are the same too. You take the training wheel off when you need to go faster and take sharper turns. Which is pretty much the same benefit I see with Clojure being dynamically typed.


I've been telling people that you should only add s/valid? in places that send data to the outside, and potentially receive data from the outside (unless you know the sender has already run s/valid? on it or vice versa. That's because, when you go to prod, all your internal data flows have been tested by you at the REPL, with QA, by your unit tests, by your integ tests, in beta use, in staging, on your pre-prod stage, etc. So at that point, you should be very confident that your internal data flow is correct and has no bug. It doesn't need to be validated anymore. Which is why you should instrument in those cases, but not once you go to prod. But, outside interaction you cannot predict, who knows what data the user is going to send you. What you do know is, given valid data, everything works, but you don't know what happens given invalid data, so instead of trying to handle invalid data, you just run s/valid? and reject invalid data. This happens to be true even for strong statically typed languages. The types can't assert statically at compile time that outside actors won't send you invalid data when in prod. And if you only use your static type definitions to perform runtime validation of it, you don't get very good coverage, since most likely, not the entire range of possible String type is valid input. So most of the time, people need to add ad-hoc validation code on top of their type definitions. Its kind of annoying, now your data specification is actually in two place, partially in the types, and then in some custom validation functions.