Fork me on GitHub
#datomic
<
2017-05-18
>
val_waeselynck05:05:42

I'm about to start coding a library which reimplements the Entity and Pull APIs to support derived data (derived attributes & getters). Before I dive in, is anyone working on this already?

laujensen08:05:45

Im running a Datomic/MySQL service, which in the couse of 6 months has produced an inno-db file weighing in at 26gb. Thats seems excissive to me. Is the MySQL a bad fit or is this to be expected?

val_waeselynck09:05:48

@laujensen do you gcStorage on a regular basis?

laujensen09:05:40

@val_waeselynck Never. I understand it to remove history beyond a certain point.

val_waeselynck09:05:13

No, it doesn't delete data. It can only mess around with Peers which hold an old db value. But if you gcStorage from 1 week ago you should be safe (unless you have some process which holds on to some db value for longer than a week, which seems unlikely šŸ™‚ )

laujensen09:05:53

@val_waeselynck Aha... I'll give it a try. Sounds like something you would run weekly then ?

val_waeselynck09:05:29

yes that seems like a sane default.

val_waeselynck09:05:34

you should try that and see if you inno-db file gets smaller. Having said that, one of the design choices of Datomic is to store a lot of things redundantly, the underlying assumption being that storage is cheap.

laujensen09:05:35

@val_waeselynck And that makes sense, but I still want to retain some control over storage consumption. Right now it looks like its growing exponentially

val_waeselynck09:05:06

maybe your business is too šŸ™‚

val_waeselynck09:05:59

I don't know that there are any knobs for that. Maybe you store more things that you intend, or have a lot of unneeded updates for the same data?

val_waeselynck09:05:26

One thing you can do is avoid unnecessary indexes (see the :db/fulltext and :db/index options) but it's best to anticipate that ahead-of-time

laujensen09:05:20

@val_waeselynck Well, yeah, I guess the business is too. But dumping the entire DB without history is just 5% of the total data consumed now.

laujensen09:05:29

The gc is running now, is there a way to monitor its progress?

val_waeselynck09:05:30

you can also set :db/noHistory on some high-churn attributes

val_waeselynck09:05:49

not expert enough for that sorry

val_waeselynck09:05:24

what I would do is look at the Transactor and Storage metrics, the activity of gcStorage may be visible there

laujensen09:05:35

Its only run for a couple of minutes, but its already consumed 1gb of disk space šŸ™‚

laujensen09:05:00

Oddly enough, checking the log the gc cycle completes in less than a minute

laujensen09:05:11

And instead of freeing up disk space, consumed it

val_waeselynck10:05:33

maybe you need to run additional MySQL-specific gc

val_waeselynck10:05:41

sadly I'm really no help in that regard

marshall13:05:13

@laujensen You can determine the ā€œrealā€ amount of space required for a Datomic DB by running a backup and calculating the size of the resulting backup dir on disk The difference between that and used storage space will be made up of recoverable storage garbage, unrecoverable storage garbage, and storage-specific overhead

marshall13:05:25

the first of those can be resolved with gcStorage

marshall13:05:09

the storage-specific overhead has to be reclaimed via the storage with something like Postgresql VACUUM or MySQL OPTIMIZE TABLE

marshall13:05:27

alternatively if you can tolerate the downtime you can backup your DB, restore into a NEW backend storage instance and switch over your system. This approach will remove all types of garbage

robert-stuttaford13:05:00

i turned my local from 30gb to 6gb by backing up, deleting, and restoring my local šŸ™‚

robert-stuttaford13:05:17

thatā€™s about 8 monthsā€™ accretion of garbage segments

marshall13:05:32

@robert-stuttaford do you run gcStorage regularly?

robert-stuttaford13:05:47

ā€¦ we really should

marshall13:05:50

indeed šŸ™‚

robert-stuttaford13:05:13

this is my local machine, with multiple successive restores. so all the previous restoresā€™ garbage is now unreachable by a gcStorage

robert-stuttaford13:05:30

but itā€™d be worth doing on our production storage for sure

marshall13:05:36

ah. yeah itā€™s probably not worth gcStorage on a local restore

marshall13:05:46

just blow away the data dir (if youā€™re using dev)

marshall13:05:54

but, yes, you should run it in prod

jfntn13:05:03

Can db.type/uuid somehow be used as a valid :db/id value or do I need to create a new attribute?

matthavener14:05:36

@jfntn: new attribute, :db/idā€™s are provided by the transactor.. you canā€™t choose them

matthavener14:05:04

but you can use a stringized uuid as a tempid if you just need to generate a unique :db/id for adding facts

laujensen14:05:39

@marshall thanks for weighing in. Im on the trail now but need to migrate to another host before I can kick it off.

jfntn14:05:51

@matthavener makes sense, thanks

Lone Ranger21:05:07

does this look right to anyone? trying to get a transactor/peer/etc running locally

Lone Ranger21:05:10

datomic:sql://<DB-NAME>?jdbc:

Lone Ranger21:05:26

I feel like it shouldn't say <DB-NAME> there šŸ˜…

favila21:05:25

@goomba that is the right pattern for a datomic.api/connect call

favila21:05:47

replace <DB-NAME> with the name of the datomic database you want

Lone Ranger21:05:06

okay, so that looks normal for the transactor to print as the URI when you run it?

Lone Ranger21:05:14

alright making some progress... so close... trying to run the peer and getting this

Lone Ranger21:05:18

Access denied for user 'datomic'@'localhost'

Lone Ranger21:05:47

running the following command

Lone Ranger21:05:52

bin/run -m datomic.peer-server\
	-h localhost \
	-p 8998\
	-a myaccesskey,mysecret \
	-d datomic,datomic:

Lone Ranger21:05:51

I can connect just fine if I run mysql -udatomic -pdatomic datomic

Lone Ranger22:05:06

wait, is <DB-NAME> the name of the mysql-database name or should it be mysql?

shaun-mahood22:05:36

I've been going through the Datomic docs looking for any specifics on requirements for running it on PostgreSQL. I haven't been able to find anything specific on requirements - am I safe to use Datomic on any reasonable installation of PostgreSQL or are there specific flags or settings I should be using that I missed in the documentation? This is going to be for experimentation to start but ideally will move into a real project soon.

favila23:05:19

@goomba the mysql database name is "datomic"

favila23:05:24

that's at the end

favila23:05:28

that string is a template

favila23:05:24

replace <DB-NAME> with your datomic database name, not your mysql table name

favila23:05:42

all datomic databases store data in the same mysql table

favila23:05:58

mysql is being used as a key-value store for blobs

favila23:05:01

nothing more

Lone Ranger23:05:29

oh snap, how do I found out what I named my datomic database?

favila23:05:43

At some point you called (d/create-database "datomic:) or restored from a backup with a similar looking uri

favila23:05:21

to list them all

Lone Ranger23:05:42

well... šŸ˜• seems like everything so far is correct then, must be some other error or SQL setting I'm missing

Lone Ranger23:05:01

appreciate it šŸ™‚ @favila