2025-01-11 datalevin | Clojure Slack Archive

datalevin 2025-01-11

Jeremy 2025-01-11T10:41:24.706189Z

Hi all, Is there an option to turn off schema-on-write?

2025-01-15T14:27:41.189309Z

You can turn on closed schema if that's what you want. That will throw an error if you insert anything that is not defined in your schema (sounds like the opposite of disabling the schema on write).

2025-01-15T14:28:25.299939Z

https://github.com/juji-io/datalevin/blob/ede16dd86ff5fcd9fb90d4bcbf632470a695b2c1/CHANGELOG.md#added-7

Jeremy 2025-01-15T14:34:05.259009Z

I think it is 😅 . I'd try it out shortly. thanks.

2025-01-15T14:41:05.998679Z

There's also :validate-data? which will throw errors if you insert the wrong type

2025-01-15T14:41:13.276209Z

I normally use both

Jeremy 2025-01-12T17:25:50.524489Z

It's not crucial in my use case; I'm indexing a ton of data and wouldn't want attributes to be created without my knowledge. I'm currently using malli, but would prefer to do without it for a tiny extra performance gain

Huahai 2025-01-12T17:29:46.149219Z

What would be the desirable behavior instead? Ignore the unwantted attribute? Error out?

Huahai 2025-01-12T17:32:15.905009Z

From the description, you seem to want to have a whitelist of attributes, and ignore anything not on that list?

Huahai 2025-01-12T17:33:59.807339Z

Isn't this easier to do in user code than adding something to DL. It seem to complicate things quite significantly, as there are multiple paths of data ingestion.

Jeremy 2025-01-12T17:34:28.473959Z

I'd want to throw an error or get some kinda value that indicates failure or extra attributes created; So that I'm alerted that I didn't handle an attribute in my schema.

Jeremy 2025-01-12T17:34:57.697609Z

Yh, I guess user space is the way to go

Huahai 2025-01-12T17:37:52.254579Z

so this is a form of more strict schema check. Sure. File an GitHub issue, we can probably add this. I think an entry of a whitelist in the option map should do it.

👍 1

Jeremy 2025-01-12T17:38:41.025049Z

@huahaiy If I might ask one more question.. I'm ingesting json data from external sources; obviously, the keys aren't namespaced. Is there a way to auto namespace a map insertion (as well as remove the namespace on fetching)? I'm also currently doing this in user space, but wondering if there's already something similar

Jeremy 2025-01-12T17:39:30.157749Z

> File an GitHub issue, we can probably add this Will do. thanks!

Huahai 2025-01-12T17:41:50.908159Z

JSON in/out is on the roadmap, when we get to that, auto name spacing can be dealt with as an option.

💜 1

Huahai 2025-01-12T17:46:20.165979Z

Would it be useful to add an entry of :attributes-created in the transaction report?

Huahai 2025-01-12T17:49:14.455339Z

On the other hand, if you are doing bulk ingestion, I would recommend not transacting things, instead, use init-db and fill-db, it's much faster. Of course, you will need to do more work in user code to translate your data into datoms. Transaction slow down is mainly due to having to check existing data (a lot of reads), and these reads also use the same read/write transaction, so everything is done sequentially.

Huahai 2025-01-12T17:51:57.369139Z

the speed difference is at least 4x

Jeremy 2025-01-12T19:03:21.330199Z

Yes, and :attributes-created would be useful. I added that to the feature request as an alternative.

Jeremy 2025-01-12T19:08:30.582999Z

> the speed difference is at least 4x I've only sample with a few thousand of entities per tx, so never really noticed any slowness. the actual data would be millions per tx, so I'd have to look into this. Much appreciation for your work @huahaiy

Huahai 2025-01-12T19:21:12.559489Z

Thanks for these suggestions! @olajeremy123

2025-01-11T16:07:38.932309Z

Or, perhaps, more generally, a way to get the schema that resulted from schema-on-write? So you could compare it with what you started with, and either adopt the changes or address the bugs that caused them.

Jeremy 2025-01-11T16:09:18.290709Z

That sounds good too

Huahai 2025-01-11T22:56:44.472159Z

What is the goal of turning off schema-on-write?

Huahai 2025-01-11T22:57:42.598589Z

schema-on-write just assigns an :db/aid to an unseen attribute, That's about it. So there's really not much to it.

Huahai 2025-01-11T22:59:34.935509Z

Could you describe what the issues are?

Clojurians Log v2

datalevin 2025-01-11