Fork me on GitHub
#datomic
<
2015-11-02
>
robert-stuttaford13:11:52

@magnars we use d/with in our stats processor to check for empty txes prior to submitting them, to prevent tx noise. empty txes have 1 datom: :db/txInstant

pesterhazy14:11:50

I'm wondering. If I configure my datomic peer library using datomic:, how does it know which transactor to connect to?

jgdavey14:11:52

@pesterhazy: the transactor URI is written to storage.

pesterhazy14:11:42

@jgdavey: interesting. what happens if I have two transactor connect to the same storage uri?

jgdavey14:11:35

One will become a fallback (HA), and won’t write it’s location unless the other one stops phoning home

pesterhazy14:11:54

that's all automatic, then?

jgdavey14:11:45

Well, HA technically only kicks in with the paid licenses. Anyone else care to expound?

pesterhazy14:11:01

I just created a new Auto Scaling Group with a new license key. Based on what you said, if I disable the old ASG, the new one should kick in automagically

pesterhazy14:11:09

with no client reconfiguration required

jgdavey14:11:56

But yes, so long as peers and transactors share storage, the transactor location “communicated” to the peer through storage.

pesterhazy14:11:26

I guess that requires that the transactor has a sort-of public IP address

jgdavey14:11:01

Well, it just needs to be accessible to the peer.

jgdavey14:11:03

Transactors actually write two IPs to storage: host is normally the internal network address, and alt-host is usually the public IP

pesterhazy14:11:04

a reassuring word about this in the docs would be great (though maybe I didn't look hard enough)

jgdavey14:11:41

Peers try host first, then use alt if the first isn’t accessible.

jgdavey14:11:59

I don’t want to misspeak here, though. Other thoughts, @bkamphaus ?

pesterhazy14:11:30

In this case I'm actually fine with things working out of the box (as they seem to be)

Ben Kamphaus14:11:45

@jgdavey: a slight correction, alt-host is not usually the public IP, but only provided if a different public IP is needed. Of course with docker (or containerization in general) and more vms in the clouds setup, this does show up more.

Ben Kamphaus14:11:42

High availability is documented in fairly high detail here: http://docs.datomic.com/ha.html

Ben Kamphaus14:11:17

I do think there is an organizational deficiency in the docs at present around the heartbeat mechanism and how peers determine which transactor to correct to, including the alt-host mechanism (we’re transitioning this from an implementation detail to a public facing transactor property). We’re considering how we want to address it.

pesterhazy14:11:42

yeah, for me it wasn't clear how peers discover the transactor in the case of dynamodb

pesterhazy14:11:18

I considered the idea that the address is written to storage, but rejected it as unlikely simple_smile

pesterhazy15:11:44

@robert-stuttaford: have you had time to look into turning your datomic-backup script into a gist yet?

pesterhazy15:11:13

the use case is to get a partial backup of a prod db for development, which doesn't include credit card information or db sessions

pesterhazy15:11:33

sorry s/db sessions/session data/

pesterhazy15:11:05

one way I can think of is to get the data on a test system, excise everything you don't want, and then do a backup. Is that what people do?

Ben Kamphaus15:11:20

@pesterhazy: for a few different reasons, to build a dev db I would avoid anything that implicitly “forks” the db (excise on a backup) and do something like replay the log, filtering out datoms that should not go in the other copy.

Ben Kamphaus15:11:38

at the connection and storage level, dbs are unique and there’s no accommodation in Datomic for the concept of “two different versions of the same database” with forked, missing data, etc. The idea of using filtered dbs, or dbs as-of etc. e.g. in query using the API are ways of dealing with db values.

pesterhazy15:11:49

yeah I'm also inclined to think that excision is not the right tool for the job

pesterhazy15:11:05

I've looked into filtering the tx log, but haven't found an obvious way to determine that kind of entity a :db/add refers to

pesterhazy15:11:56

and that's what I want -- filter out certain kinds of entities (payment records, session data), not filter out a specific attribute

pesterhazy15:11:11

plus an attribute (like :user) might be a possible attribute of both payments (which I want to discard) and addresses (which I want to keep)

Ben Kamphaus16:11:11

you can always pull the entity in question to see what’s associated with it. does everything in the db (or at least that has refs to/from it) have some kind of UUID - any unique identifier other than the entity id?

Ben Kamphaus16:11:21

you can always do stuff like pull the entity as of the time immediately before/after a tx, also to inspect it (using the as-of filter on a db), doing it a lot can get expensive perf wise, but it depends on the overall size of the db you’re filtering whether or not that really matters. Also, since it’s going to dev, and doesn’t impact a liveness window for prod.

pesterhazy16:11:16

that's useful

pesterhazy16:11:45

many things have a unique identifier, though maybe not all

robert-stuttaford17:11:45

havent had a chance, sorry, @pesterhazy !

pesterhazy17:11:18

I'm trying my hand at a simple edn dumper for datomic

pesterhazy17:11:29

that might get the job done as well

robert-stuttaford17:11:01

that’s what i have, except it writes transit instead of edn