This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-08-28
Channels
- # bangalore-clj (1)
- # beginners (67)
- # braveandtrue (179)
- # cider (28)
- # cljdoc (1)
- # clojure (132)
- # clojure-conj (3)
- # clojure-dev (1)
- # clojure-finland (6)
- # clojure-nl (2)
- # clojure-russia (6)
- # clojure-spec (19)
- # clojure-uk (62)
- # clojurescript (90)
- # clojutre (5)
- # component (2)
- # cursive (30)
- # data-science (1)
- # datomic (42)
- # duct (9)
- # emacs (1)
- # figwheel-main (158)
- # fulcro (57)
- # funcool (3)
- # hoplon (1)
- # jobs (17)
- # mount (38)
- # off-topic (15)
- # re-frame (53)
- # remote-jobs (2)
- # schema (11)
- # shadow-cljs (299)
- # spacemacs (25)
- # specter (2)
- # tools-deps (54)
- # vim (11)
- # yada (6)
With the client API (on Ions), I’ve made a basic pagination function:
(defn paginate [offset limit results]
(take limit (drop offset (sort-by second results))))
(d/q {:query '{:find [(fully.qualified/paginate 0 10 ?tuple)]
:in [$]
:where [[?id :article/id ?uuid]
[?id :article/title ?title]
[(vector ?uuid ?title) ?tuple]]}
:args [(get-db)]})
The above works fine, but it goes nuts when I try to parameterize the query:
(d/q {:query '{:find [(fully.qualified/paginate offset limit ?tuple)]
:in [$ offset limit]
:where [[?id :article/id ?uuid]
[?id :article/title ?title]
[(vector ?uuid ?title) ?tuple]]}
:args [(get-db) 0 10]})
ExceptionInfo Datomic Client Exception clojure.core/ex-info (core.clj:4739)
As I've been learning Datomic (and Datascript) I've come across a practice that seems to make a lot of sense but I don't see it in the examples so much so I wanted to see if it was considered good or bad form or "just another way to model the data." The practice is to make heavy use of refs, sometimes to the point where the data model consists of a number of atomic values/entities and the majority of the entities are aggregations of references to those values. For example, without the practice I'm describing you might model a movie like so (let's assume each field has a schema that refers to the field and its value type - e.g. :movie/title is a string):
{:movie/title "Ben Hur"
:movie/year 1959
;Cardinality many on this one
:movie/actors ["Charlton Heston" "Stephen Boyd"]}
However, you might recognize that the movie title, year, and actor names are all other values in the model. Instead, you might do this:
{:movie/title {:title/string "Ben Hur"}
:movie/year {:year/value 1959}
:movie/actors [{:actor/name "Charlton Heston"}
{:actor/name "Stephen Boyd"}]}
In this case, every field is a ref out to another entity. The movie entities are defined logically and have no actual primitive value fields themselves. These referenced values can then be used to construct other movie (or other domain) entities in which they are used. For example, you could reference other movies or books with the same title or other events that happened in that year.
Is this considered good practice? Does it have any sort of negative implications on the size of your indexes?@markbastian I've been looking at this for some attributes, but not all. Specifically those I want to enforce as unique throughout the DB, like email and URL.
alternatively, if you want to use entities with value-ish semantics (so they are shared-by-value) then they should have a unique attribute or some kind of hash-derived id
we use this technique as a kind of compression and to get around datomic not having custom value types
As a title or year, I would think these things do have identity.
The number 1959 wouldn't be particularly special. There are an effectively infinite number of them. But movie release years are limited. Less than 150.
And as a title, there are a limited number of works related to "Ben Hur" (one book, several movies, etc.)
In the year example, all of the references would have schemas along the lines of
{:db/ident :year/value
:db/valueType :db.type/long
:db/cardinality :db.cardinality/one
:db/unique :db.unique/identity}
in which they would exist uniquely in the domain.In the above case I am presenting an extreme, but the idea is that you may have a relatively finite number of values from which all other entities are built. Some things, such as movie revenue, would definitely not fall into this category as they could be effectively anything.
A value like 42 is its own identity, you don't need a second layer of identity on top of it
Hmmm, that makes a lot of sense. One thing I like about what I was doing was that I could do very fast queries along the lines of:
[:find [?e ...]
:in $
:where
[?t :title/string "Ben Hur"]
[?y :year/value 1942]
[?e :movie/title ?t]
[?e :movie/year ?y]]
As long as the set of titles and years were relatively small this will be quite fast. It should just be a set operation on the backreferences to the domain values. Essentially the domain values provide a gateway into the entities.
If, on the other hand, I did something like this
{:movie/title "Ben Hur"
:movie/year 1959
:movie/actors [{:actor/name "Charlton Heston"}
{:actor/name "Stephen Boyd"}]}
I would query with something like this:
[:find [?e ...]
:in $
:where
[?e :movie/title "Ben Hur"]
[?e :movie/year 1992]]
Wouldn't this option be dramatically slower for a large data set? It seems like I don't have a fast path to my movie entity. I don't really have a strong concept of identity. The best definition is probably "title+year". Any thoughts as to a better way to think about this?"Wouldn't this option be dramatically slower?" No, quite the opposite. Your first option has twice as many joins in it
are you aware of this? http://tonsky.me/blog/unofficial-guide-to-datomic-internals/
Yeah, I've read Tonsky's post several times. I actually get great performance with the first query and worse (but not bad) performance with the second when using Datomic. Not the case, though, with Datascript since it doesn't seem to have backreferences built in to the indexes. I do want to emphasize, though, that in the first model the title/string and year/value are references to unique identities. There is no concept of index or identity in the data model of the second query.
:movie/year 1959
, if not indexed, will require a scan over :movie/year
to get the matching value
if there is only one entity for any given :movie/year
value obviously that will be a faster scan
but there's still a second lookup in the :vaet
index to go from the movie-year entity to movies which reference it
asserting :movie/year
on the movie entity directly when the attr is indexed removes this extra lookup
BTW, I appreciate everyone's help on this. I've been trying to achieve "Datomic Enlightenment" for a while now and a few things, like establishing identity when there is no obvious primary key and a database function won't do, are still elusive for me. This was just something that I thought of that seemed to solve the problem of "weak identity". In other words, you know facts about something that, taken together, tell you exactly what you want, but the thing you want doesn't have a natural single ID.
Perhaps setting :db/index true on :movie/title and :movie/year would accomplish what I am going for without adding any additional concept of identity to what are otherwise primitive values.
I'm trying to enumerate tradeoffs on Datomic Ion placement. Our main production east1 account (A) is not the same as the Datomic Cloud east1 account (B). Assuming I need to consume a Kinesis Stream with an Ion-backed lambda, do I: Place the Stream in account A and the Lambda/Ion in acct B Place both Stream and Ion in account B, produce to the stream remotely from acct A