Fork me on GitHub
#onyx
<
2016-08-14
>
robert-stuttaford08:08:29

@lucasbradstreet: interesting onyx-datomic/read-log question for you. how might i read from TWO database logs, and process the txes from both as a single stream ... in combined, transacted order?

lucasbradstreet08:08:03

Boy. That's kind of a tough one

robert-stuttaford08:08:42

i'm starting an epic task to refactor our whole production database, to rebuild everything according to our current best understanding -- correct schema, eliminate unwanted data, etc -- and i also want to combine two databases. to rebuild a db, you need to do it in transacted order, if you want to keep the time data inherent in datomic dbs

robert-stuttaford08:08:17

my first idea is to stream one database and query the second for any txes between the current and previous txes for the streaming db

robert-stuttaford08:08:24

walk the first and query the second, basically

robert-stuttaford08:08:53

was just wondering if it would be possible to plex two input streams some other way

lucasbradstreet08:08:42

It’s possible to order the txes with windows, but I think you would have to keep the whole DB in memory as we don’t have a disk state store implementation yet

robert-stuttaford08:08:01

that's actually possible. the whole db is under 8gb

robert-stuttaford08:08:25

Highstorm git:(rebuild-db) ✗ du -hs ~/Datomic/data/db 8.1G /Users/robert/Datomic/data/db

robert-stuttaford08:08:38

and i only really need the tx-log which is some fraction of that

lucasbradstreet08:08:42

e.g. DB-1 read-log -> window-task, DB-2 read-log -> window-task, ensure that window-task orders the tx-es correctly, and use a trigger to write out to the new db

lucasbradstreet08:08:00

actually if you do the windows / refinements correctly you wouldn’t even need to hold the whole DB in memory

lucasbradstreet08:08:14

because you can flush out every now and again when you’re sure you won’t be getting any txes out of order

robert-stuttaford08:08:21

how quickly do you think i can sort ±58 million transactions?

lucasbradstreet08:08:01

Unsure, but I think if you do it right you will only need to be sorting X at a time

lucasbradstreet08:08:33

I would be interested in this job - I think others could find it pretty useful for rewriting their history

robert-stuttaford08:08:56

yes. like i said, it's an epic, because i've got a big list of tech debt to undo

lucasbradstreet08:08:58

what’s cool about it is you can just leave it running while your main system continues to transact

lucasbradstreet08:08:18

at some point they’ll be basically in sync, then you take down your main system, let it finish, then swap over

robert-stuttaford08:08:26

most of the work will be in filtering txes and writing convertors for found tx shapes

robert-stuttaford08:08:42

yeah! i'll definitely unpack that with you at some point

robert-stuttaford08:08:31

i want it as a job because i want to be able to reuse it -- being able to rewrite whole-history regularly buys a hell of a lot of flexibility

lucasbradstreet08:08:43

Cool, it’d be nice if you could use a different transactor to transact the new DB 😕

robert-stuttaford08:08:48

e.g. being able to re-partition data or shard data

robert-stuttaford08:08:19

nothing i'm aware of prevents that

lucasbradstreet08:08:22

My intuition was that the transactor will write its own location out to your storage, and it would be hard to read from one DB which uses one transactor and write out to storage using another

lucasbradstreet08:08:30

I’ve never tried something like that though so it was only intuition

robert-stuttaford08:08:33

would definitely want a separate transactor for the final production run, because it's far cheaper to restore big changes to DynamoDB

robert-stuttaford08:08:47

they're completely separate connections

robert-stuttaford08:08:02

the only commonality is that they're both accessed by the same JVM process

robert-stuttaford08:08:20

peers actually find the transactor via storage 🙂

lucasbradstreet08:08:21

ok cool, makes sense. I was just a bit worried about whatever magic datomic does to figure out the transactor

robert-stuttaford08:08:44

connect to storage, get primary txor ip, connect, download live index, ready

lucasbradstreet08:08:46

since they might be on separate dbs it’s probably fine because it’ll just look at that DB instead

robert-stuttaford08:08:16

it uses storage and heartbeats written to storage to coordinate HA failover

robert-stuttaford08:08:51

thanks for listening -- i'm off for now. i'll definitely share what i have when it's ready 🙂