2025-03-30 datomic | Clojure Slack Archive

datomic

2025-03-30T09:06:24.803949Z

Hello everyone 👋 I was wondering if there are some open design docs of datomic out there, and more specifially I'm interested in the design decisions regarding open schema vs strict schema. If anyone happens to know about formal (docs) or informal (discussions) on this topic I'll be most grateful 🙏

👍 1

2025-03-30T11:47:32.035839Z

Rich’s speculation talk probably has the most info on that in particular.

2025-03-30T11:50:27.621859Z

https://youtu.be/oyLBGkS5ICk?feature=shared You mean this one? I watched it once and it didn't ring a bell... I'll give it another try... If there's anything more specific / concrete to datomic, I'll be glad to read

2025-03-30T12:16:58.787739Z

Yeah that one. It’s not datomic but it talks about the value of having an open set of attributes.

2025-03-30T12:17:07.991639Z

iirc

2025-03-30T12:24:19.636099Z

Ohh I think I understand you now. But I asked why (ironically) datomic opt for the opposite direction of a closed rigid schema (if I now give the db more attributes, the db breaks)

2025-03-30T13:10:48.004429Z

oh you're saying "why does datomic have a closed schema?"

2025-03-30T13:11:00.490149Z

depends on what you mean by "closed" I guess

2025-03-30T13:12:23.065409Z

I forget which talk it is, maybe deconstructing the database(?), where he talks about schema's allow you to index things.

2025-03-30T13:12:41.799049Z

namely, declaring the type of an attribute allows you to index it properly

2025-03-30T13:13:37.106569Z

but it's not a "closed" schema. you can add new attributes at any time and any entity can have any attribute. it just requires you to declare attributes instead of accepting arbitrary attributes.

danieroux 2025-03-30T14:07:22.321419Z

Schemas on entity level is too rigid, schemaless is too free. eavt attributes are the goldilocks of schema: Just enough. And adding attributes to the db can never break the db.

2025-03-30T19:39:49.344059Z

Well, you seem to be a strong advocate of datomic's specific schema model ☺️ I also believe Rich has very good ideas, I'm just curious about their origin and how he got to them. What was his thought process in this particular design decision.

2025-03-30T19:50:50.857889Z

And I should clarify since the question rose: I took the word "closed" as a contrast to the "open world assumption" from RDF (also mentioned in "deconstructing the db"). By "closed" I mean rigid that every attribute must be declared in the schema. By "open" I mean that it can accept attributes not mentioned in the schema (it might still have a schema and apply its constraints whenever appropriate). Indexing and performance were also an answer I speculated myself, still I'm interested in the real (as not speculated) tradeoffs and reasoning that were involved in this design decision (because I believe the brains that were involved in this process were more experienced than mine so I'll probably learn something).

Cameron 2025-03-31T17:23:16.137039Z

I think there is a meaningful distinction here between datomic schemas and specs. Maybe there is some slight overlap in the roles they play, e.g. declaring what attributes are present in a particular system, doing some basic type validation, but they have different concerns I think, so the same statement about open/closed sets of attributes may simply not apply equally to both. The spirit I pick up on when it comes to spec and focusing on open instead of closed systems has to do with how systems composed of many parts change over time and the particular ways in which breaking changes occur, with "adding new data to a map" being one of those "should never be a breaking change" maxims. Datomic, however, is a database and it's at the edge of your system and I think in this case, the spirit is that we want to rigorously control what goes into the database. It is not a document db where anything goes. I think this is not only for indexing and performance purposes, as others have mentioned, but also for the purpose of being able to reason about the state of the database. So like a lot of other database systems, there is a way to declare via schema what the universe looks like, this means you can't put random things into it, if you want to adjust the universe, you have to add new schema. This is my own perspective on the topic, having worked with both spec (and other schema libraries) and separately Datomic and its schema system.

2025-03-31T23:57:56.635959Z

It seems like the exact definition of 'closed' and 'open' is unclear, but datomic -could have- easily worked like defrecord does, schema attributes that are declared (defrecord Classs [attrA attrB ...]) get high performance characteristics, but nonetheless an instance of the record can hold any kind of attribute.

Cameron 2025-04-01T00:00:43.057519Z

The example of "closed" that I am most familiar with is "When I run a validation check, if there is an attribute in my map that is not covered by my spec/schema/whatever, then there is a validation error" When Rich says in whichever talk it is "I'm not going to help you create broken systems" he is talking about this idea of closed specs that aggressively disallow extra information to be present. So my understanding of open as it relates to information conveyed in the talks is that it's fine for additional data to travel along in the pipes of a specified system.

2025-04-01T00:03:12.647409Z

Yah that tracks speculation talk very closely. I always thought the language also applied to defrecord's design too, but ultimately the i'm not super attached to the meaning of those words / getting too focused on semantics, but whether Datomic could work like defrecord is a reasonable topic to explore.

itaied 2025-04-01T08:41:01.131309Z

@cjb i like your distinction between spec and schemas, but as you said roles do overlap, and when something is checked elsewhere i would like to tell the db "forget about this part" for example, consider syncing data from another source, maybe even partial fields (like payment transactions). the transaction are already verified and validated and i don't want the db to know anything specific about it, just take the data (whatever shape it is) and pass it on to the clients (browser, mobile) to handle it and display it somehow Datomic forces you to duplicate the schema and make the situation a dual source of truth (where is the correct schema?). i'm sure that the datomic team discussed these issues and concerns, and i also wonder if we can find the sources of the design decisions written instead of some partly vague notes in youtube videos

ghadi 2025-04-01T15:27:56.109059Z

> But I asked why (ironically) datomic opt for the opposite direction of a closed rigid schema (if I now give the db more attributes, the db breaks) I'm scrolling back but I'm not sure I understand this point

2025-04-01T15:31:26.151769Z

Ignore it. It's unimportant rhetoric... I'm just interested in why datomic opt for rigid schema and doesn't accept attributes that are not declared in the schema

raspasov 2025-05-03T05:13:24.544519Z

why datomic opt for rigid schema and doesn’t accept attributes that are not declared in the schema1. Because accepting random type would be more akin to a “file system” and much less of a “database”. 2. Or it would require a “first insertion declares the type” convenience built into the database. I believe with Datomic you can actually do that yourself, if you really have the need (say, you are trying to ingest “varied” data with unknown attributes) 3. Related to 2, in order to do indexing efficiently and well, the type of the thing feels likely* to be a good thing to now. PS* If anyone is aware of any formal research into this topic I would be curious.

2025-05-04T08:46:03.949139Z

Thank you @favila I imagine it can be tiring to feel like you have to explain (and defend) yourself to strangers all day. I want to stress again this is not a criticism of the design choice. You already have my trust, and I believe there are intricate analysis and good reasons behind most features, so there's no need to defend anything. From your last message I think I recognize the gap in our conversation: I can't see the tradeoff between Y(rigid schema) and Z(I don't even know what was traded). I didn't walk the path so I don't know. That's why I ask people who did. Because even if I just go with XTDB, I probably give up some things (the contrasting tradeoffs those dbs made), and to know those ahead of time is just gold. We already been able to learn some principles about dbs from experienced people (specifically in distributed writing), And while we survey solutions, as you can imagine, most only advertise their strong points, but we were able to anticipate, from the principles we learned, that if they do X they probably can't do Y, and then look for it in the small prints, and most of the time there it was (and when it wasn't it became even more interesting). But we can do this only when the experienced teach us, and that's what I'm trying to do here. I can imagine it can be bothersome and that's why I apologized in advance, but I still don't shy from asking questions.

2025-05-03T19:58:46.220159Z

@favila @joe.lane @alexmiller Do any of you might have an idea/ know for a fact the considerations / have access to the original design doc and able to share the relevant excerpt? Forgive me if I'm bothering you too much 🙏🏼

2025-05-03T20:05:35.592959Z

@raspasov indexing and performance / efficiency are my guesses as well. I'm just wondering if this is really it? And if they came to the conclusion that you can't get schema a la carte + temporal tx log + performance it would be extremely interesting to me (to dig in and understand the first principles behind this conclusion)

favila 2025-05-03T20:07:20.955789Z

I think the choices and trade offs are evident, and Datomic exists on a point in that spectrum. Why does it matter why Rich Hickey the man chose it? You writing a history? In any case I have no special access to why he made the specific choices he made other than his public talks

favila 2025-05-03T20:08:37.409879Z

Also “open world” vs “closed world” assumptions are not about schema per se but query/reasoning. Rdf is famously open world but also has (many) schema layers

favila 2025-05-03T20:09:44.022279Z

https://en.m.wikipedia.org/wiki/Closed-world_assumption

2025-05-03T20:35:41.748999Z

Thank you for responding. And I'm sorry, it sounds like my question did bother you a bit. I'm not writing a history. I'm just investigating some databases and I'm trying to understand the essential design tradeoffs (kinda like CAP theorem i.e. if you have multiple writers then multi entities transactions become harder, etc...). Datomic is one of my most appreciated db for many decisions (txlog, eav, datalog...) and I also appreciate the design skills of the people behind it (because of their work). As part of my research I also have a use-case that doesn't fit a rigid schema very well (it does benefit from a partial schema). But I extremely appreciate the other traits of datomic and I wondered if there's an inescapable tradeoff that these decisions can't co-exist. So naturally I value this input very much for my research... It's by no means criticism, I just want to learn. In any case thank you for your response and time 🙏 P.S I'm sorry if I misused the terms open/closed-world. I saw it in asami being related to schemaless and thought I understood what it meant. Anyway I hope it's clear now that I meant optional schema (i.e constrain what you need).

Joe Lane 2025-05-03T20:42:38.571059Z

We may be able to provide specific advice to you if you can provide some details about what your use-case is that “doesn’t fit a rigid schema”

2025-05-03T20:52:54.984889Z

Thank you @joe.lane , It's basically an entity with a polymorphic field (can take different shapes in different instances), that is sourced from some other service we don't control. And we are currently stringify -> parse. But it's important to stress that our research goes beyond that and it is really more important for us to understand the principles, than to solve some single use-case. Thank you again for your time and efforts, I appreciate it 🙏

Joe Lane 2025-05-03T21:15:10.296809Z

FWIW, refs in Datomic are polymorphic, so, your entity can point to different kinds of entities via the same attr. For example if you created a financial transaction and wanted to track an entity representing the source of that financial tx, you could reference an httprequest entity or a kafkamessage entity from the :transaction/source reference attr.

2025-05-03T21:29:27.114269Z

The shapes are too diverse and change too frequently for us to model... (It is the parameters for hundreds of functions, some of the params are themselves nested maps and vectors). We want to track and be able to answer sophisticated questions on the interfaces and how they change over time. Datomic fits great on "over time" (temporal) and "sophisticated questions" (datalog), but the shape of the data in some parts is too noisy.

Joe Lane 2025-05-03T21:48:02.826879Z

Have you looked into the old Datomic project called codeq for inspiration?

2025-05-03T21:56:30.843549Z

No, I'm just seeing it now and it looks super interesting! Thank you 🙏🏼 We'll look into it more deeply both for functionality (because it's interesting by itself) and implementation for inspiration of sophisticated usage of datomic. Thanks again for pointing me to it 🙏

👍 1

favila 2025-05-04T02:16:17.569159Z

Sorry for being curt. There are different problems in the world with different contexts and trade offs and not everything is going to fit, and I don’t think anyone needs to defend why database X acts like Y and not Z if we can see trade offs between Y and Z. I personally wish writing a database a la carte was much easier and we had many more and different databases in the world. The database monoculture has gotten us stuck with everything needing to look and act like SQL and that’s just the worst tragedy. If you have data you must ingest in as info-preserving a way as possible and can’t control, and need ad hoc query over it, I don’t know that datomic is going to serve that need well, or at least not by itself, and that’s ok. It’s meant to be a system of record for data you do control. Other systems exist which may suit your problem better. Eg XTDB uses a structured document model but still lets you do ad hoc datalog query over it

➕ 1

Clojurians Log v2

datomic