This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-08-16
Channels
- # admin-announcements (2)
- # bangalore-clj (3)
- # beginners (15)
- # boot (303)
- # carry (18)
- # cider (7)
- # cljs-dev (222)
- # cljsrn (103)
- # clojure (196)
- # clojure-czech (2)
- # clojure-russia (69)
- # clojure-spec (21)
- # clojure-uk (48)
- # clojurescript (68)
- # cursive (18)
- # datomic (185)
- # events (1)
- # hoplon (2)
- # lambdaisland (1)
- # leiningen (1)
- # mount (10)
- # off-topic (1)
- # om (14)
- # onyx (154)
- # parinfer (1)
- # pedestal (3)
- # planck (5)
- # protorepl (9)
- # re-frame (17)
- # reagent (27)
- # ring (2)
- # specter (58)
- # test-check (1)
- # testing (7)
- # untangled (59)
- # yada (35)
it would be nice if datomic provided a reader literal for squuids #squuid "etc"
aren't they just uuids when you're reading them?
if you already have them, what part needs to know whether they were generated sequentially or not?
gosh. you're right. i'm a dork. i guess what i meant is it would be nice to generate squuids via a tag in edn
kinda like temp ids
no, i thought that's what you meant, just got confused by the "etc" i think
yeah, that was incorrect
I was curious, looks like these are the data readers defined by datomic {db/id datomic.db/id-literal db/fn datomic.function/construct base64 datomic.codec/base-64-literal}
#squuid might be handy too, I don't see why not
if we could do the same as we do with temp ids, e.g. #squuid[1], so that you could create relationships with squuids the same you can with db ids
that would be awesome
Anyone done a query/pull-exp cheat sheet? Because I would use that thing every day...
http://docs.datomic.com/query.html and http://docs.datomic.com/pull.html are pretty comprehensive. i found that good old practice embedded the concepts quickly
Yeah, those are my go-tos. Still, it'd be nice to have a one-pager to quickly refer back to, especially if I haven't been doing it for a while & I'm forgetting particular details of syntax.
@robert-stuttaford: the trouble with a squuid tag is that the generated uuids would not be deterministic... at this point I'd say the content of the EDN file has stopped being 'just data',
sounds like a good opportunity to contribute 🙂
@val_waeselynck: true, but this is already the case with #db/id
IF you supported determinism with e.g. #squuid -1
so that multiple uses of the token resolve to the same value
using a basic cache
I know, I feel there's a difference with tempids though, not sure how to express it
at least with tempids it's always the same datoms that ends up in storage, not so with random uuids
so it's kinda more deterministic
which is probably why we don't have a reader tag 🙂
@robert-stuttaford: I guess so. Even the #db/id
tag felt weird to me in the beginning TBH
@robert-stuttaford: btw, I recently stumbled on your podcast about Datomic and Onyx, I really liked it
thanks! which one? on defn.audio?
yeah that's the one
that was a fun chat. Vijay and Ray are a blast
I'm looking for solutions to make my analytics faster and more scalable, so definitely looking into tools like Onyx
this may be of service to you http://www.stuttaford.me/2016/01/15/how-cognician-uses-onyx/
@robert-stuttaford: read it too 🙂
@iwillig: enjoying your episode 🙂
you mentioned about how you're having to think differently about historical data
have you started to realise the difficulty of technical debt in your data ? 🙂
i'm busy working on an epic to rebuild our database, transaction by transaction
initial analysis of the first 2mil txes yields ±120k txes i want to keep. the rest is either schema, data we no longer want, or bad programming
the bad programming and old data are about equal!
I'm worried about the ever-increasing complexity of historical queries that have to deal with schema changes
you mean having to query across all the versions of the schema?
yeah. we've handled that in a couple ways. small data sets, we just re-transact and lose the time information. larger ones, we've continued to query across
It may not even become a problem in practice, I'm not yet sure how far back we'll need to go. But here I am worrying about it 🙂
i'm looking forward to unifying all that in the rebuild
the primary driver for doing this is to be prepared to shard in future, by building good tooling now
10 billion datoms is the theoretical upper limit for a db. we're at around 100mil, which means we have 99 copies to go. that's what's worrying me 🙂
also, i get to re-partition the data according to the read patterns we've since discovered we have
that kind of thing keeps me from worrying about partitioning too much - I just don't know how it'll be. For some reason, that reasoning works on me for partitions, but not for future schema changes.
@bhagany: curious about your specific problem. Is it that you are querying on asOf dbs and need to compensate for "future schema change" in your queries that go too far in the past ?
I don't actually have that problem yet. But I am trying to anticipate future schema needs now (and at the same time trying to not try, because that kind of thing can get you in trouble too)
@bhagany: my take on this was to actually stop using asOf in application code
history is not programmable
I can see your point there. I may come around to endorsing it, depending on how this goes.
@bhagany: That's a very interesting problem actually. I think what you could do in a technology like Apache Samza is derive a new Log of facts from an old Log of facts, adding the migration, and using the new Log as the data source in the application code.
That'd be an indirection between facts-recording and querying which Datomic does not have (yet)
@robert-stuttaford: have you found in testing how much re-partitioning could possibly speed up query patterns that are currently problematic for you? Just curious.
@bkamphaus: not yet. i haven't managed to actually rebuild the db yet. it's a big task -- 58mil txes, ~4 years worth. the first 2mil txes yielded over 100 transaction shapes to reason through alone
i'll be certain to share any findings, though. this gon' be fun!
yeah, one of the big value props for Datomic for me is being able to support arbitrary query without over-engineering any particular aspect of the schema/model for particular query patterns. Obviously you never quite hit that point 100% with any database, but I’m curious if in the wild people end up needing to solve pain points with partitioning, or something like a reasonable set of partition across a few logically grouped domains is usually sufficient.
@val_waeselynck im not sure that's true.. #db/id[:db.part/user] isn't deterministic, it's using a counter behind the scenes which is increased on each transaction
#db/id[:db.part/user -1] would be deterministic
or it depends on the basis-t of the db-after the transaction, im not sure it uses a counter or not
@danielstockton: yes but I would argue that it's the same datoms [eav] that end up in storage, so it's more deterministic in a way
e.g you can rely on transacting your edn file being idempotent
but the tempid determines the e in a datom, which can be different?
it depends when you transact and against what db
but if you're importing one edn file on a fresh database, then i guess it is...
the idempotent aspect of the schema comes from upsert for a :db/ident att/val pair (which is a unique identity) and special rules for tempid resolution in that case.
which is dependent on the fact that an entity id is not an attr/val pair but its own thing, and which is not true of e.g. a uuid attribute.
it’s less what #db/id[:db.part/user -1] resolve to across all invocations versus the fact that they will resolve to the same tempid within a particular transaction, meaning that it will result in the implied link/relation/join for tempids being fulfilled by the resulting entity id generation.
is it possible to have a function to get-or-create an entity? I wanted to write a function that does a lookup, if it finds the criteria, it returns the ent-id of the match otherwise it creates the entity and returns the ent-id of the newly created entity. I can do the lookup in a regular datalog query but I believe that would not be thread-safe, e.g. if I'm importing big datasets and the query is run on multiple machines, it will rapidly create multiple duplicate entities. My function looks like this: https://gist.github.com/jdkealy/42bf630ceba6385914a43d5645d31d55
my function returns tx-info like so {:db-before datomic.db.Db@2f39cc32, :db-after datomic.db.Db@4be3c61a, :tx-data #object[java.util.ArrayList 0x787bfc5c [datomic.db.Datum@1953ce9d]], :tempids {}}.... but i didn't actually transact anything... do i access the returned query via tx-data ?
also... im calling the function like so... @(d/transact @db/conn [[:person/namer oid name]])... so i guess i am transacting... i'm a bit confused obviously on this subject
@jdkealy: you’re crossing a couple of concerns that are decoupled in Datomic. I would split the logic somewhat.
Do the query to see if what you’re looking for exists yet, if not, go through either a transaction function to create it or rely on assigning entity a unique identity so you can rely on Datomic’s upsert behavior
eventually to figure out the outcome of the transaction to get the entity that was created you’ll want: http://docs.datomic.com/clojure/#datomic.api/resolve-tempid
right.. but datomic's uniqueness constraint is only on a single attribute as far as i know
If something has a unique identity in Datomic, it will handle that race for you, i.e. it will resolve the transaction to the existing entity ( http://docs.datomic.com/identity.html#unique-identities )
composite uniqueness isn’t a thing in Datomic at this point in time, yeah.
yes, it is, though there’s an advantage to taking opportunities to rely on predefined behaviors rather than explicitly program your own with transaction functions.
but composite uniqueness would preclude being able to rely on the default behavior for this case.
like... return the entity id or else create it in a single-threaded way? i'm worried about creating dozens of dupes as i'm going to be running this code on like 4 servers
@stuartsierra: hi 🙂 in the latest Cognicast, Craig mentioned your predilection for "decanting databases". it sounds like you've done this a couple times. i'm embarking on a rather large decanting of my own soon, and i wonder if you have any tips, or perhaps even generalised code that may be useful?
i.e. can the datomic function return the result of a query or does it only return data related to a transaction
if this is basically a big import and you can provide a unique identifier from the domain or by pre-generating uuids for everything prior to import, the default unique identity upsert behavior gets you there for free.
a transaction function (note this isn’t the only kind of database function but the typical one) returns transaction data that are then transacted on the transactor (provided it doesn’t throw an exception), but the results are standard transaction result maps. I.e. you can’t change the behavior of what happens on the other side.
but you could define things like for example, attempt to create this thing it it doesn’t exist, throw an exception, rely on that exception on the peer to know that if you get/sync a database value after your attempted transaction you can get the entity via query.
ok... so perhaps instead of returning the entity id and then transacting with the id i should focus on doing the full transaction in the function ?
or... another way would be ... if i do call the thread-safe transact function, i can do a lookup directly after and it's guaranteed to be unique right ?
yes though that implies a blocking deref on the transaction and inspecting the :db-after, which is fine but may slow down import logic considerably if you’re doing this e.g. on every typical transaction.
My first pass (knowing nothing else of the domain) would probably be the transaction function that tries to transact the thing and if it already exists, aborts the transaction via exception, and then either uses A. tempid resolution for a successful transaction result and a query to find it if the transaction aborts, or possibly B. just query to find it on a database value after the transaction attempt (successful or not) since it should be there either way.
basically: find, or try: create-via-tx-fn, catch: find
(or (d/q ...) (try (d/transact ... [[:your-make-fn-which-first-also-does-the-d/q-thing ...]]) (catch ... (d/q)) (d/q ...))
you'd move the query bit to a function of its own to keep things DRY of course
distributed systems are hard 🙂
if i have multiple (pull)
expressions inside of the :find
clause in a query, is it possible for Datomic to not return nil
if one of the pull queries doesn’t return anything?
no. nil is not a thing that datalog does at all
sounds like a good candidate for breaking your code apart
i may not fully understand how you're getting an empty vec though
if one of those pulls doesn’t return any data, even if the other one does, the query will return [[nil]]
i'd put the pulls outside of d/q in a separate fn call
and deal just with ids in d/q
i don't know the answer to your actual question, though
what happens if you explicitly include :db/id in your pull expressions?
yeah, the underlying problem is a complex query for various data and metadata associated with certain entities, some of which can be potentially missing, and i still want to get all the data back, instead of constraining the result set
nothing wrong with separate queries
it's all in local memory anyway 🙂
it's a non issue; datalog is working with sorted sets of datoms in local memory, always
very often it's better to decouple things!
good luck!
thanks for your help @bkamphaus... i went with solution B... it appears to work on https://gist.github.com/jdkealy/4d8da9c5bbb37df19978c45256ea1856
What is the S3 backup-uri format? I tried http://bucket.s3-aws-region.amazonaws.com and <s3p://bucket-name>.
for reference, which is probably what you just found 🙂
Hmm.. Is it possible to use backup/restore to copy one DB to another? I tried, however, I got this exception:
java.lang.IllegalArgumentException: :restore/collision The database already exists under the name '...'
Can’t copy one db to two different names in the underlying storage. You can overwrite a db by restoring to the same name, or restore that db to new name on a different storage.
[:find (min ?e) (max ?e)
:in $
:where [?e :some/attr “some-value"]]
does it make sense to interpret the two values returned above as the earliest entity associated with that value vs. the latest entity associated with that value, assuming that there are multiple entities that share that same attribute-value pair?@pheuter: leading part of entity id is from partition, so multiple partitions can break that strategy.
I would bind the 4th position (tx) and use that if it’s what you mean specifically. Also note that unless the parameter $
is a history db, you won’t find the earliest association if it has since been retracted.
so my question is then how does it resolve aggregating multiple entities, and then for each entity multiple tx-entities?
If you do a (max ?tx)
on [?e :some/attr "some-value” ?tx]
it will be the most recent tx
in which the datom matching the leading portion [?e :some/attr “some-value …]
was asserted. (and is still true as of the most recent database value). If you pass a history db, it will be most recent tx to touch it (even a retraction) unless you also bind the 5th position to true
, i.e. [?e :some/attr “some-value” ?tx true]
.
in my particular case i’m looking to use ?e
in a subsequent :where
clause to get a related entity, how can i know that i’m getting the entity associated with the latest tx
?
basically, there are two entities, a
and b
. b
has an attribute that’s a ref to a
, and it’s possible to have multiple b
entities that ref to the same a
I’m not sure I follow what you’re asking as it looks like your concern is covered. The where clause limits the results, so you only get the relation from entity to transaction constrained by the presence of that attribute and value, for the most recent transaction.
:where [?b :some/ref ?a ?tx]
aggregated on the max value of the ?tx
returns that datom. I guess you could get a set of matches in the event that there are multiple b
entities which assert :some/ref a-id
in one transaction.
Oh, grouping behavior.
right, it seems like a workaround now is to manually build a map of tx-ids to entity-ids, find the max, then get the entity-id associated with it
if the grouping behavior runs afoul of what you need, as is the case here (just tested it), I would just return the ?e ?tx
tuple and apply max-key second
on the result.
the aggregation in query always realizes the intermediate set in memory on the peer anyways, so it doesn’t save you any performance cost to avoid the seq manipulation, really.
sorry for initial detour, forgot that ?e (max ?tx)
only shows you max ?tx grouped by e, not what you wanted in this case. It’s also possible to use a subquery, if you’re stuck with REST API or don’t have clojure manipulations and don’t want to realize the whole thing in a query, but if you’re in clojure I’d stick with a single query and a sequence manipulation.
hello! how to recover from a connectexception? I wanted to create a db if there's none so I made the follow command
(try
(def conn (d/connect uri))
(catch ConnectException e (d/create-database uri)))
(:import ( ConnectException)))
Show: Clojure Java REPL Tooling Duplicates All (3 frames hidden)
3. Unhandled java.util.concurrent.ExecutionException
2. Caused by org.h2.jdbc.JdbcSQLException
1. Caused by java.net.ConnectException
Connection refused
@bkamphaus: thanks for the patience and help, makes a lot of sense now 🙂
oh I guess I need to start the transactor
I am having trouble connecting a third peer. We currently have a license for 10 peers. Two of the peers are being used by a staging and production server. I want to query the Datomic instance running in the cloud from the REPL. However, when I try and connect to my transactor running in the cloud from the REPL, I get clojure.lang.ExceptionInfo: Error communicating with HOST ... or ALT_HOST ... on PORT 4334
. Both the staging and production servers are able to connect to the Datomic instance. Do I need to set a username and password locally somewhere or change a local license key?
in cases like these, network configuration is always my first stop. have you checked that the transactor is reachable, ports are open, etc?
I mean, obviously you can't connect with the peer library. But can you, say, telnet to it?
alright, to be honest, that just about exhausts my advice. it's always the network for me 🙂
It would be nice if there was a different exception thrown if it was a peer problem or a network problem
Hmm.. Will someone from the Datomic team see these messages or should I email them directly?