Hi again, I'm new to datalog/datomic/etc.. up until now, I've been using datalevin as a kvs, but want to also use it for some time-series data. I have an attribute that is a map of sorted maps (which represents price levels and sizes). can an attribute be a sorted map? and is it possible to query it with datalog? or does datalevin store it as blobs which you can only operate on after reading.
e.g. so you can turn your keys into a single long value, saving lots of storage space.> Is this different from blob storage? What's the trade-off? If you just want a sequence/vector of values attached to an entity. Also had no idea you can have both kv and datalog in the same db, that's awesome.
> Added two KV query functions to get first n key values in a. range. So it might be helpful when you go the KV route.
That was quick lol. awesome.
> Is this different from blob storage? What's the trade-off? If you just want a sequence/vector of values attached to an entity. It saves you from having to deserialize the whole sequence, if you just want the first few items
Correct. With these stored in lists, you can use various range query functions on them, and these are very efficient, basically the work horse of this database. So you don't have to deserialize the whole collection. Particularly you don't want to deserialize Clojure immutable collections, these are expensive to construct. In my experience, even Java hash map is too expensive. Only special collection types such as bitmaps, compressed integer arrays, etc. are worthwhile. We used these special collection blobs to build our fulltext search engine.
Of course, if extreme performance is not of concern, e.g. not trying to beat Lucence or Postgres, then these consideration is not important, blobs are good enough for application programming.
So it all depends on what you are trying to do. Datalevin is flexible. Experiment and measure.
You can normally map whatever data structure into entity-attribute relationship. So your sorted map can be probably considered an entity of its own.
I would recommend to follow a standard ER data model when using Datalog.
I see. would i be able to run queries like getting highest key in the map?
Don't store blobs.
Of course.
Datalog is just a more ergonomic SQL.
:find (max ?whatever)...
great. thanks.. an entity of mine has a variable price level property, so i'm unsure of how else to store it other than as sorted map in maps
Datalog works well with normalized data. The point of triple store is to store data in their smallest form possible, i.e. datoms.
Show us what your data looks like, we can show you a schema to store it.
Alright, kindly allow me few mins to type it out
a market snapshot has 10-20 entity snapshots. each entity looks like: entity snapshot: + valid-from (time snapshot was taken) (timestamp) + entity-id (int) + price-data (map) + price-kind-1 (sorted-map) {2.1 30, 3.0 40 ...} + price-kind-2 {900.5 30, ...} + price-kind-3 {...} I can normalize the market by giving each entity market-id attribute. but unsure of how to represent price-data except to store it as is
what are the sorted map? map of what to what?
double to double (price to volume)
One possible representation:
{:market-snapshot/valid-from {...}
:market-snapshot/entity {...}
:market-snapshot/price-kind {...}
:price-snapshot/price {...}
:price-snapshot/volume {...}
:price-snapshot/market-snapshot {:db.valueType db.type/ref}
}You can further separate price-kind from market if you want, as we still have some redundancy there.
Or you can turn the reference around, have cardinality many reference to prices instead.
maybe that feels more natural to you.
ahh, very interesting.. I think this is the way to go. Only issue is each sorted map has around 20-50 keys, multiplied by 3 price kinds and average of 10 selections and 10k snapshots per market, which is around 16 mil. I'd have to investigate the disk space consumed in practice. if it's too much, I only need to store the top 5 keys per kind, and can archive the whole price-snapshot as blobs.
> Or you can turn the reference around, have cardinality many reference to prices instead. do you mind elaborating on this? I don't quite get it
`:market-snapshot/price-snapshot {:db.valueType db.type/ref :db/cardinality :db.cardinality/many :db/isComponent true}`
Of course, if your goal is just to store these as time series, storing them as blob is fine, as you are not going to finely slice and dice them in ad-hoc queries.
I'd perform lots of queries regarding the top 5 levels, so your normalize approach is perfect. I also wouldn't want to just discard the remaining price levels, so I'd archive as blobs for safe-keeping. thank you very much @huahaiy
You don't even need to use Datalog, Datalevin KV store can store lists
so you can have a market-snapshot map as the key, and price-volume tuple as the values.
That would be much faster
{:entity-id xxx :valid-from xxx ... :price-kind "price-kind-1"} would be the key
[price volume] tuple would be the value
open-list-dbi for this DBI, that would be the sorted map you want
a list DBI basically is a sorted map of sorted map: keys are sorted, values can be a list, also sorted.
For time series data (assuming high write rate), I would go with KV store, bypass the expensive machinery of Datalog transaction.
I am working in enhancing write throughput with async transactions, that should handle very high write rate.
another interesting approach. Idk how none of this came to mind. I spend days searching for various dbs 😭.
we don't have great documentation yet
I don't have need for high write rate (i store in memory for real-time data), but fast and flexible queries are what i'm looking for (for historical data). I'd have to weight both these approaches
your data seems to be simple enough for a KV solution, to be honest
for the keys, you don't need to use a map blob, use a heterogeneous tuple looks good enough, and it support range query, so you are not missing anything from Datalog store.
waitt.. you mean kv store supports queries? I think i missed that on the documentation
I'd have a look. if so, I guess that's what I need for price data. I'd still store market info in datalog store
[3 #inst "2024-11-14:0101" "price-kind-1"] as the key, [3.5 42]... as the values.
there are so many KV query functions, list-range, list-range-count, list-range-filter, list-range-first, list-range-keep, etc.
Datalog store is built on top of KV store, so it's by definition slower, particularly, the transaction logic is expensive, because of the complex semantics of Datalog that we support. It's a cost that can be avoided if the use cases do not call for it.
I mean you have a very simple data model that do not demand lots of complex arbitrary joins, I would go with a KV solution.
The query patterns are pretty predictable, it's time series data.
I see. I've mostly just used kv stores as persistent caches, so i'm currently flooded with new possibilities.
Thanks a lot once again
you can use kv store and datalog in the same DB file also, so you can have the time series part in the KV store, and other relationships in the datalog store also.
Datalevin is versatile. It's in our slogan 😀
e.g. so you can turn your keys into a single long value, saving lots of storage space.
key 29239 values [3.5 42] [1.5 10]... where 29239 is an entity id from the Datalog store.
so you can expend your ER model arbitrarily on the Datalog side.
Added two KV query functions to get first n key values in a. range. So it might be helpful when you go the KV route.