This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # alda (6)
- # architecture (1)
- # bangalore-clj (3)
- # beginners (39)
- # boot (292)
- # braveandtrue (1)
- # cider (7)
- # clara (2)
- # cljs-dev (20)
- # cljsjs (9)
- # cljsrn (42)
- # clojure (127)
- # clojure-chennai (1)
- # clojure-dev (96)
- # clojure-india (1)
- # clojure-russia (175)
- # clojure-spec (56)
- # clojure-uk (11)
- # clojureindia (1)
- # clojurescript (82)
- # core-async (7)
- # cursive (21)
- # data-science (1)
- # datomic (173)
- # funcool (4)
- # hoplon (8)
- # instaparse (1)
- # jobs (7)
- # jobs-discuss (1)
- # jobs-rus (30)
- # lambdaisland (1)
- # lein-figwheel (8)
- # off-topic (5)
- # om (51)
- # onyx (79)
- # other-languages (7)
- # planck (8)
- # re-frame (95)
- # reagent (6)
- # rum (8)
- # specter (4)
- # untangled (54)
- # yada (5)
@cezar I'm not sure the hash index is WAL logged so it might not be reliable enough
https://www.postgresql.org/docs/9.5/static/indexes-types.html I'm looking at the warning here
i think its a bit faster for KV type lookups and quite significantly faster for writes
unless you're really concerned with writes, I don't think it's an issue, just cache reads as much as possible so you never have to go to storage
@danielstockton: ah ok. I agree, I would add that data transfer time will likely dominate the btree read lookup time, and that indexing time will likely dominate the btree write time 🙂
true, log index is also a b-tree though and needs to be written to before a transaction is committed (not via background indexing)
My concern is not so much with speed but with data volume. If I have a bunch of Datomic databases managed by a single transactor the sole datomic_kvs table will become massive and the corresponding B-Tree index will be very slow for new inserts. In my experience anything over 100M entries in a BTree is just not performant for most applications. Again, I'm more concerned over inserts than reads. Also to preempt some, yes, I realize there is an option to use Dynamo, Couchbase etc but within an organization it's always easier to deploy on infrastructure that's already in place
@cezar: I doubt you'll reach 100M entries (would mean 100M segments, each of which contains from 1000 to 20000 datoms according to the docs - http://docs.datomic.com/capacity.html#sec-6), whereas we know the practical limit of Datomic is 10G datoms.
So theoretically, you'll stay 1 order of magnitude below the 100M limit I guess
ok let's work with my actual numbers: ~1000 databases (only a handful used at any one time) 1 transactor up to 500M datoms per database
pessimistically, assuming you have 1000 datoms per segment, that's about 500k segments per database
license limits only apply at the txor level. you can run as many as you want
i will have spikes of heavy writes to a couple of database at a time and then they go dormant for a long time
but I can't excise or archive them. they have to be theoretically accessible due to SLA
@cezar: If you don't want your BTrees to get too deep you could maybe create several
@val_waeselynck: but how do I set up the transactor to write to a bunch of tables vs just one?
I think for this kind of advanced stuff I should definitely leave you in the good hands of Cognitect support 🙂
I hope they could pipe in here 🙂 I don't have a contract with them yet (though we are currently 90% committed to Datomic for this project)
but I do have to resolve the BTree growth issue or get the buy in to use a proper KV store like Cassandra or Couchbase
@cezar Datomic does not currently provide any way to remove the “dormant” dbs from a transactor, you would have to fail over to another transactor to do that
@stuarthalloway: that's not really my issue. Scroll up for the start of this conversation
I am not saying it has to be that way — nobody has requested a use case like yours before
how do folks like Nubank who eventually plan to use datomic at scale? tons of transactors?
well, I was hoping to neatly subdivide my data into a database per customer account (we have a couple of thousand customers)
@cezar how quickly do your need to bring up a db for a mostly dormant customer?
e.g. you could make a process manager that spins up an appropriate peer/transactor pair on demand, and then have your own external logic to spin them back down
depends on the request, but usually if it's dormant the a couple of minutes might be OK... but I'd have to consult product managers on this
@robert-stuttaford: transactors cannot share the same storage, but can cohabit in the same storage engine under different table names
that's what i thought, which does solve the original problem cezar mentioned
Datomic is fundementally a Cloud architecture, built for a world where processes are cheap and isolation is a Good Thing
@cezar transactor should only handle a tiny number of dbs that are both (a) large and (b) have ongoing write volume, and by tiny I mean <10, probably closer to 1-3
lots of customers at scale shard by time, so only 1 db has ongoing write volume
in e.g. the AWS cloud, the answer is clear — you just do 1 transactor per db and be done with it
for people running their own data centers, this can be more of a challenge because they lack something as polished as CloudFormation, ASGs, etc.
@robert-stuttaford it is easy if your queries are time-scoped by the nature of the domain. Just start a new transactor+db on each domain time boundary
i guess i'm more curious about the boundary between the shards and the control database
@cezar I understand, and Datomic may not be a great fit. What was your aggregate data size across all customers, in datoms?
@stuarthalloway do you guys have any tools for rebuilding databases? what was mentioned as 'decanting' on the last cognicast episode. i'm gearing up to do so at the moment, and i'd love to leverage any shortcuts that may exist, if you have any 🙂
reason is to get rid of all the accumulated cruft over 4 years - badly named schema, unwanted data (in the 100,000s datoms range), no-op transactions, etc
@stuarthalloway: Datomic is a very good fit otherwise. Plus we already started building on it. It never occurred to us that the limit of DBs per transactor was so small. We might still manage somehow but it's certainly making our lives a lot harder. I don't have an "aggregate" figure now but the data (like most data) will be cumulative over time. I forecast about 100B datoms per year (spread across many separate DBs)
@robert-stuttaford: several customers have written tools, some with our help. Some planned to open source but not sure any have.
@stuarthalloway forgive my cheekiness, but is it perhaps possible for you to put me in touch with those who planned to open source theirs? it's a big job i'm tackling, and i'd love an independent perspective on this, as i may save myself some time and effort
@robert-stuttaford I have a tool like that useful to make a subset db, based on your work
yeah. this time i care about maintaining the transaction order, and not losing the original timestamps. the one i shared with you before is just a 'now' snapshot, which is a lot simpler to produce
Hey all... I've got kind of a difficult query. The issue is that the data set is rather large. I have an event of a specific type that I"m trying to tie back to another entity based on related ref's they each have. I'm finding that I'm running out of memory before this query completes. I was wondering if anyone had any tips
you could instead use d/datoms -- which is lazy -- to walk one entity kind and use pull / entity to discover the rest
if you need to do this query often, you could write cache refs into the database that shorten the path from one to the other
(d/q '[:find ?ue :in $ :where [?ue :user-event/type :user-event.type/create-share] [?ue :user-event/asset ?asset] [?share :share/assets ?asset] [?share :share/created-at ?t] [?ue :user-event/occurred-at ?t] [?user :user/events ?ue] [?user :user/shares ?share]]
if so, then you can cheat:
(seq (d/datoms db :vaet (d/entid db :user-event.type/create-share) :user-event/type)) all the
:e values on this seq will give you
you can then craft a pull spec which expresses the rest of your clauses, or perhaps several normal clojure operations
well, this way, you have the option of batching results and transacting those cache refs every so often
Yep, I was originally just thinking of doing the first line adn then using partitions to chunk the data into smaller pieces
Did Datomic used to have attributes which were later removed? I'm wondering why there seem to be gaps (e.g. no entities 5-7) and nil entries in (:elements db)
Does anyone here use Datomic in tests with Circle CI ? I can't seem to figure out if this is possible
Say you regularly receive a broadcast entity (say, a User profile) that probably hasn’t changed.
If you make a db/tx function to check if it needs to actually be transacted, you also get a bunch of empties (because you return ) most of the time.
If you query the DB to resolve the entity, then compare it to the one coming over the wire, you’re not transactionally safe.
And if you have more than one, perhaps the occasional empty if the “value of the db” you’re querying is updated elsewhere. Hm.
Or you could have your db/tx throw a specific “short-circuit” exception you don’t have to log as an error.
@zentrope you could also batch them to reduce the number of empty transactions
Hm. Makes sense. Or even put a cache/memoize in there somewhere. Store incoming message checksums.
@zentrope of a Bloom filter or whatnut, but you may run into cache invalidation issues
you can also serialize externally even with several peers using e.g HornetQ with Message Grouping
Yeah. All techniques outside of datomic itself. Perhaps the “throw a special exception” idea is the least amount of work.
For instance, with RDBMS, you can use a .rollback if you discover things don’t need to be done. That kind of thing.
Even if I do the naive thing and just query the database right before deciding to write, if I do overwrite something, I’ve always go the history. ;)