Fork me on GitHub
#datomic
<
2015-07-30
>
erichmond03:07:26

is GPG broke on el capitan? I am trying to work with datomic-pro and am getting "pg: gpg-agent is not available in this session” even thought that daemon is indeed running

caskolkm06:07:47

@bkamphaus @bostonaholic: still the same error using: @(d/transact conn [[:db.fn/retractEntity (Long. company-id)]])

mitchelkuijpers12:07:20

@bkamphaus: @bostonaholic I found our problem with retracting an entity somehow we saved the entities in the

db.part/tx
which is obviously wrong 😅

Ben Kamphaus14:07:31

@mitchelkuijpers: that would do it. Note that to have gotten this outcome if using e.g the map form, you’d have specified a tempid for :db.part/tx in the same attr. These attributes then become attributes on the transaction entity. For annotating a transaction in the map form case you would need to supply separate maps for the attributes intended as tx annotations and attributes intended for a new or existing entity. Example (though Java) at: http://docs.datomic.com/transactions.html#reified-transactions

erichmond14:07:24

Follow-up : It was because I didn’t fully nuke the gpg installed by brew before installing the one recommended by leiningen.

max14:07:44

I just realized that postgres is using 30gb of storage for a small app

max14:07:13

I assume this is because I havent been garbage collecting, or am I messing something up bad

erichmond15:07:24

What is the best tutorial for someone who wants to use datomic + clojure

erichmond15:07:28

these docs are a mess

Ben Kamphaus15:07:46

@erichmond: the day-of-datomic repro is at: https://github.com/Datomic/day-of-datomic — if you’re talking about the tutorial on the docs page, it’s available in clojure in the datomic directory as mentioned here: http://docs.datomic.com/tutorial.html#following-along — if you’re looking at query specifically: http://docs.datomic.com/query.html points to the clojure examples from day-of-datomic here: https://github.com/Datomic/day-of-datomic/blob/master/tutorial/query.clj

erichmond15:07:29

@bkamphaus: thanks, also, this datomic for 5 year olds is helping too

marshall15:07:08

@erichmond: We also have the full Day of Datomic training session as a series of videos here: http://www.datomic.com/training.html

bhagany15:07:23

I really got a lot out of those videos, fwiw

erichmond15:07:59

thanks, I’ll check out the videos too.

Ben Kamphaus15:07:01

@max: you should be doing some gc http://docs.datomic.com/capacity.html#garbage-collection — you may also have to take additional steps for postgres (and other storages) to reclaim data, e.g. VACUUM http://www.postgresql.org/docs/9.1/static/sql-vacuum.html

erichmond15:07:13

Actually, all the querying and whatnot is pretty straightforward to me

max15:07:32

so it looks like my vm ran out space (I had a 40gb vm)

max15:07:36

I upped the disk space

max15:07:41

and my database size is only growing

max15:07:50

and the transactor is unavailable

erichmond15:07:51

I was looking more for “10 steps to firing up a mem based datomic connection” “10 steps to firing up a dev based datomic connection + datomic console”

max15:07:56

will this resolve itself?

erichmond15:07:08

I’m realizing now, if I want to run mem, I don’t even seem to need to download that datomic.zip, etc

Ben Kamphaus15:07:05

@max not enough info to tell. can you tail the logs to see if the txor is busy? e.g. indexing

max15:07:59

bkamphaus: debugging this I also found that my only log file is log/2015-06-26.log.

max15:07:07

I kept the default logback.xml

max15:07:11

so that’s another issue

max15:07:30

is there another place they could be?

Ben Kamphaus15:07:40

@max: does your transactor properties file specify a different log location?

max15:07:35

bkamphaus: ah thanks. Okay so it’s indexing

max15:07:37

I may have done a bad thing.

max15:07:37

I accidentally shoved some ~860kb strings into datoms

max15:07:50

am I hosed here?

Ben Kamphaus15:07:11

well it definitely can kill perf stuff, and will depend on how your system is provisioned. But yeah, you definitely want to avoid large blobby stuff in datoms. options for recovery — do you have a recent backup? You can also excise that stuff.

Ben Kamphaus15:07:35

are those fields in :avet? i.e. indexed -- that’s when it will hurt the most by far.

max15:07:45

they are indexed

max15:07:54

bkamphaus: in the future, if i want to store this, doing noHistory and without index would be a bad idea still?

Ben Kamphaus15:07:55

less of a bad idea, but I’d still avoid it. Indexing it guarantees that it will be a huge perf drag. Your best option for blob/document type stuff is to put in storage directly and store the pointer/ref/key w/e for it in Datomic in the datom

Ben Kamphaus15:07:25

or a file store, e.g. s3

Ben Kamphaus16:07:05

@arohner: Stu has replied re: your questions/issues on bytes reported here in slack and on group https://groups.google.com/forum/#!topic/datomic/JqXcURuse1M

arohner16:07:26

@bkamphaus: yeah I just saw, thanks

erichmond16:07:50

@bkamphaus: do you work on datomic for cognitect?

Ben Kamphaus16:07:39

@erichmond: yes, I’m on the Datomic team at Cognitect.

Ben Kamphaus16:07:14

I agree that it’s very cool to be on this team. simple_smile Also, typing is hard.

max17:07:03

bkhamphaus: thanks for your help so far.

max17:07:17

I tried to run a garbage collect and an excision of one of the attributes

max17:07:32

my database size is still growing (33->46 gb in the past hour)

max17:07:38

and datomic is running at 100% cpu

max17:07:11

heres a tail of the log

max17:07:25

so it looks like im still indexing?

Ben Kamphaus17:07:29

@max where you’re at, you’re waiting on indexing to push through — it will have to complete before space can be reclaimed and it will probably take longer for excision, etc. (more indexing necessary) — gc also competes for transactor resources — cpu/mem.

Ben Kamphaus17:07:39

from the log tail, seems that way

max17:07:57

how long can I expect to wait and is there anything I can do to speed it up

max17:07:12

my hd is now 160gb, can I be reasonably sure I won’t hit that?

Ben Kamphaus17:07:16

how many attr val pairs were targeted by the excision?

max17:07:37

I was just doing a test on one datom.

Ben Kamphaus17:07:59

@max can you grep for successfully completed indexing jobs, e.g. :CreateEntireIndexMsec metrics, index specific completion messages, grep ":[ea][aev][ve]t, :phase :end" *.log, possible failures (just grep for AlarmIndexingFailed.

max17:07:54

the last index specific completion was 3 hours ago

2015-07-30 11:58:23.370 INFO  default    datomic.index - {:tid 150, :I 5265000.0, :index :eavt, :phase :end, :TI 8465930.540997842, :pid 1480, :event :index/merge-mid, :count 2110, :msec 14400.0, :S -283878.0409978423, :as-of-t 961005}

max17:07:52

I have an AlarmIndexingFailed once a minute

2015-07-30 13:15:47.572 INFO  default    datomic.process-monitor - {:tid 13, :AlarmIndexingFailed {:lo 1, :hi 1, :sum 4, :count 4}, :CreateEntireIndexMsec {:lo 16500, :hi 18600, :sum 70500, :count 4}, :MemoryIndexMB {:lo 0, :hi 0, :sum 0, :count 1}, :StoragePutMsec {:lo 1, :hi 239, :sum 11097, :count 381}, :AvailableMB 2640.0, :IndexWriteMsec {:lo 1, :hi 659, :sum 35259, :count 381}, :RemotePeers {:lo 1, :hi 1, :sum 1, :count 1}, :HeartbeatMsec {:lo 5000, :hi 5346, :sum 60427, :count 12}, :Alarm {:lo 1, :hi 1, :sum 4, :count 4}, :StorageGetMsec {:lo 0, :hi 124, :sum 2204, :count 305}, :pid 1480, :event :metrics, :StoragePutBytes {:lo 103, :hi 4568692, :sum 128385966, :count 382}, :ObjectCache {:lo 0, :hi 1, :sum 231, :count 536}, :MetricsReport {:lo 1, :hi 1, :sum 1, :count 1}, :StorageGetBytes {:lo 1853, :hi 4568435, :sum 95278692, :count 305}}
2015-07-30 13:16:47.573 INFO  default    datomic.process-monitor - {:tid 13, :TransactionDatoms {:lo 3, :hi 3, :sum 3, :count 1}, :AlarmIndexingFailed {:lo 1, :hi 1, :sum 3, :count 3}, :GarbageSegments {:lo 2, :hi 2, :sum 4, :count 2}, :CreateEntireIndexMsec {:lo 15800, :hi 17400, :sum 50500, :count 3}, :MemoryIndexMB {:lo 0, :hi 0, :sum 0, :count 1}, :StoragePutMsec {:lo 1, :hi 291, :sum 11173, :count 474}, :TransactionBatch {:lo 1, :hi 1, :sum 1, :count 1}, :TransactionBytes {:lo 102, :hi 102, :sum 102, :count 1}, :AvailableMB 2460.0, :IndexWriteMsec {:lo 2, :hi 350, :sum 36373, :count 471}, :RemotePeers {:lo 1, :hi 1, :sum 1, :count 1}, :HeartbeatMsec {:lo 5000, :hi 5003, :sum 60006, :count 12}, :Alarm {:lo 1, :hi 1, :sum 3, :count 3}, :StorageGetMsec {:lo 0, :hi 100, :sum 2151, :count 351}, :TransactionMsec {:lo 19, :hi 19, :sum 19, :count 1}, :pid 1480, :event :metrics, :StoragePutBytes {:lo 86, :hi 4568692, :sum 146567666, :count 473}, :LogWriteMsec {:lo 8, :hi 8, :sum 8, :count 1}, :ObjectCache {:lo 0, :hi 1, :sum 247, :count 598}, :MetricsReport {:lo 1, :hi 1, :sum 1, :count 1}, :PodUpdateMsec {:lo 2, :hi 7, :sum 9, :count 2}, :StorageGetBytes {:lo 86, :hi 4568435, :sum 94665879, :count 351}}

Ben Kamphaus17:07:09

@max which version of Datomic are you running?

max17:07:51

datomic-pro-0.9.5173

Ben Kamphaus17:07:14

can you do a failover or start/restart to upgrade to 0.9.5201 (or latest 0.9.5206) to see if the indexing job is then able to run to completion?

max17:07:52

any reason to go with 5201 vs 5206?

Ben Kamphaus17:07:48

I’d just drop into latest 5206 if no preference, 5201 is just minimal to get past a fix for a related issue. 0.9.5206 only adds error handling/explicit limits to byte attributes

max17:07:24

bkamphaus: I updated, am getting some out of memory errors

015-07-30 13:31:43.668 WARN  default    datomic.update - {:tid 77, :pid 10386, :message "Index creation failed", :db-id "canary-f3e9a40e-2036-4ad9-aae7-52919cced434"}
java.lang.OutOfMemoryError: Java heap space

max17:07:03

I’m using

# Recommended settings for -Xmx4g production usage.
 memory-index-threshold=32m
 memory-index-max=512m
 object-cache-max=1g

Ben Kamphaus17:07:28

@max some follow up q’s then — can you verify you’re using GC defaults? Either only setting Xmx, xmx as transactor args, or if using JAVA_OPTS, adding -XX:+UseG1GC -XX:MaxGCPauseMills=50 to keep GC defaults? Also, would it be possible to up -Xmx (what’s current + available on machine)?

max17:07:07

exec /var/lib/datomic/runtime/bin/transactor -Xms4g -Xmx4g /var/lib/datomic/transactor.properties 2>&1 >> /var/log/datomic/datomic.log

max17:07:17

that’s my datomic command

max17:07:30

I could up the memory, should I change transactor props also?

Ben Kamphaus17:07:02

@max I would up memory, double it if you can — maybe up object-cache-max only slightly (i.e. to 25% of heap or so, not up to 1/2 for sure). I.e. something like -Xmx 8g, object-cache-max=2g, rest same

max17:07:01

bkamphaus: the excision finished!

Ben Kamphaus17:07:29

@max awesome — make sure and spread out the excision the way you’d normally pipeline txes on an import

Ben Kamphaus17:07:51

assuming you’re following up by removing more of the blobby string vals

max17:07:54

there are only 35 attrs to excise

max17:07:12

…my postgres db size is at 51gbs though

Ben Kamphaus17:07:29

ah, cool, so less of an issue then. as stuff pushes through, you’ll be able to run gc (or maybe it’s already running?)

max17:07:29

I ran a datomic garbage collect and it didn’t seem to do much, I assume I should run it again and vacuum

Ben Kamphaus17:07:10

but yes you should do it after excision, more segments will need to be gc’d after that

Ben Kamphaus17:07:25

the gc-storage call when finished will log something like: 2014-08-08 03:24:14.174 INFO default datomic.garbage - {:tid 129, :pid 2325, :event :garbage/collected, :count 10558}

max17:07:35

so, how did this happen? I had 35 blobs some of which were like a meg at most. And the rest of my data is pretty small. How did my db grow to 51gigs?

max17:07:48

And how do I make sure it doesn’t happen again, garbage collect daily?

Ben Kamphaus17:07:55

I don’t know how much segment churn you go through, but it does build up over time from indexing. The blobs can be particularly bad with :avet on.

Ben Kamphaus17:07:21

Nightly may not be necessary, but you can set up a gc-storage call to run at w/e period you determine is necessary

tcrayford17:07:20

(as a side reference, for my [relatively normal] webapp, I run it at application bootup, because only the webservers are datomic peers and they're deployed together)

Ben Kamphaus17:07:48

and then periodically I’m assuming you’ll need to VACUUM in postgres before space is reclaimed since the deletion in Datomic will be handled/deferred by table logic in the storage

Ben Kamphaus17:07:59

i.e. Cassandra via tombstone, Oracle space reclamation is deferred by High-water Mark stuff, etc.

max17:07:19

cool, thanks so much for your help @bkamphaus

micah18:07:34

Weird datomic error throwing me for a loop:

micah18:07:37

airworthy.repl=> @(api/transact @db/connection [{:segue/time #inst "2015-04-09T05:32:48.000-00:00", :segue/way :out, :segue/airport 277076930200614, :segue/user 277076930200554, :db/id 277076930200690}]) IllegalArgumentExceptionInfo :db.error/not-an-entity Unable to resolve entity: Thu Apr 09 00:32:48 CDT 2015 in datom [277076930200690 :segue/user #inst "2015-04-09T05:32:48.000-00:00"] datomic.error/arg (error.clj:57)

micah18:07:04

Why does it think the date is an entity?

shaunxcode18:07:07

what is schema for :segue/time ?

shaunxcode18:07:50

and :segue/user ?

mitchelkuijpers18:07:19

Thank you for your help @bkamphaus

max19:07:20

@bkamphaus: I ran a garbage collect and a vacuum, but pg still says the datomic database size is 51gb. Any suggestions?

Ben Kamphaus19:07:18

@max have you backed the db up recently so that you have a reference for how large the backup is?

Ben Kamphaus20:07:19

@micah: as a sanity check, I would verify that all entities in the transaction exist and that all attr keywords are specified correctly (e.g. spelled correctly) and exist, including the (assumed enum) :out entity — it may be that something else wrong in the transaction is causing it to resolve to the incorrect datom that’s transacting the date as the value for the :segue/user attr (the cause of the exception)

Ben Kamphaus20:07:42

@max that datomic db is the only one in that instance? you haven’t e.g. run tests that generate and delete dbs (note that dbs when deleted have to be [garbage collected](http://docs.datomic.com/capacity.html#garbage-collection-deleted), also). And definitely nothing else you’re storing in postgres?

max20:07:39

I only use memory dbs for testing, and I don’t run tests on the production instance

max20:07:34

There is another database on the pg instance, but it’s tiny:

datomic=> SELECT pg_database.datname,
       pg_size_pretty(pg_database_size(pg_database.datname)) AS size
  FROM pg_database;
  datname   |  size
------------+---------
 template1  | 6314 kB
 template0  | 6201 kB
 postgres   | 6314 kB
 canary-web | 7002 kB
 datomic    | 51 GB
(5 rows)

Ben Kamphaus20:07:55

@max have you restored versions of the same database when the restore has diverged? The incompatible restore is one thing I’m aware of which can potentially orphan segments so that they never get gc’d.

max20:07:53

I don’t think so. This is the production db, so it was initially restored from a seed db, and then only backed up

max20:07:44

it looks like a lot of the growth (20gbs worth!) happened after I ran out of disk space last night and was trying to do excisions.

Ben Kamphaus20:07:14

DBs do pick up small amounts of cruft from operational churn, but this is well out of lines of my expectation for the size of it. Depending on what kind of outage you could tolerate, you could do a test restore from backup to a clean postgres in a dev/staging environment and seeing what the resulting table size it.

Ben Kamphaus20:07:58

The failure to index could be contributing then, maybe leaving orphaned segments somehow. There’s always the possibility of clobbering the table and starting from a clean restore, obviously you want to backup and test a restore as I mentioned above first before considering going down that path.

Ben Kamphaus20:07:32

Do you know what the table size was prior to running into the indexing failure?

max20:07:12

I’m not sure, I ran out of disk space at ~30gb

max20:07:33

I’m assuming it’s going to affect performance to keep this 51gb database around

max21:07:30

So I did a restore on my dev system, and the pg database is 142mb after restore. I can do a restore in prod again, but I’m worried about this happening again. Any suggestions as to what to do at this point?

max21:07:40

is it possible I hit a bug in datomic?

Ben Kamphaus21:07:57

@max hard to speculate about a possible bug without knowing more specifics. I’m wondering how much of this can be attributed to the failures to index w/the blob-ish strings. My general advice would be to make sure and make regular backups, and configure some kind of monitoring for Alarm* events - so that you can jump in more quickly (i.e. reacting to AlarmIndexingFailed, rather than toe running out of size).

max21:07:34

bkamphaus: that makes sense, and it’s definitely my next plan of action

max21:07:03

we ran out of space at db size 30 gb, so there must have been some failure before that that caused that 30gb to be written

max21:07:24

but I guess that could have been cascading indexing failures?

Ben Kamphaus21:07:29

I think it’s fairly typical for dbs in production over time to accumulate a little bit of cruft, but nothing like the difference in size from your backup to postgres table, which is why I think it must be linked to that indexing failure. I haven’t seen another report of that much excess size, usually when I’ve looked through those concerns about size differences it’s still less than 2-3x the expected size (after account for e.g. storages with replication factors, etc.) on dbs that have been running for a long time, nothing orders of magnitude larger than expected size like this — except with whole dbs not gc’d, or gc never having been run, etc.

max21:07:24

ok. I’ll set up better monitoring and see if it happens again

max21:07:40

one more question: we’re not using aws, and I am using datadog for this data. Do you generally recommend to use the built in cloudwatch stuff and push that data to other services, or is integrating with a non-AWS monitoring service pretty easy?

Ben Kamphaus21:07:42

@max we definitely have users doing both. Cloudwatch is what we use at Cognitect and test the most, but lots of people on premise just configure their own callback ( http://docs.datomic.com/monitoring.html#sec-2 ) stuff or point it at various other logging/metric tools.

micah22:07:02

@bkamphaus: Thanks for the tip. I verify everything is correctly spelled and schema-fied.