Fork me on GitHub
#datomic
<
2018-08-28
>
henrik05:08:55

With the client API (on Ions), I’ve made a basic pagination function:

(defn paginate [offset limit results]
  (take limit (drop offset (sort-by second results))))


(d/q {:query '{:find [(fully.qualified/paginate 0 10 ?tuple)]
                     :in [$]
                     :where [[?id :article/id ?uuid]
                             [?id :article/title ?title]
                             [(vector ?uuid ?title) ?tuple]]}
      :args [(get-db)]})
The above works fine, but it goes nuts when I try to parameterize the query:
(d/q {:query '{:find [(fully.qualified/paginate offset limit ?tuple)]
                     :in [$ offset limit]
                     :where [[?id :article/id ?uuid]
                             [?id :article/title ?title]
                             [(vector ?uuid ?title) ?tuple]]}
      :args [(get-db) 0 10]})
ExceptionInfo Datomic Client Exception  clojure.core/ex-info (core.clj:4739)

henrik05:08:48

What’s the correct way to pass in parameters for the pagination function?

henrik05:08:59

Or more generally, what’s the correct way to paginate sorted results?

markbastian16:08:24

As I've been learning Datomic (and Datascript) I've come across a practice that seems to make a lot of sense but I don't see it in the examples so much so I wanted to see if it was considered good or bad form or "just another way to model the data." The practice is to make heavy use of refs, sometimes to the point where the data model consists of a number of atomic values/entities and the majority of the entities are aggregations of references to those values. For example, without the practice I'm describing you might model a movie like so (let's assume each field has a schema that refers to the field and its value type - e.g. :movie/title is a string):

{:movie/title "Ben Hur"
 :movie/year 1959
 ;Cardinality many on this one
 :movie/actors ["Charlton Heston" "Stephen Boyd"]}
However, you might recognize that the movie title, year, and actor names are all other values in the model. Instead, you might do this:
{:movie/title {:title/string "Ben Hur"}
 :movie/year {:year/value 1959}
 :movie/actors [{:actor/name "Charlton Heston"}
                {:actor/name "Stephen Boyd"}]}
In this case, every field is a ref out to another entity. The movie entities are defined logically and have no actual primitive value fields themselves. These referenced values can then be used to construct other movie (or other domain) entities in which they are used. For example, you could reference other movies or books with the same title or other events that happened in that year. Is this considered good practice? Does it have any sort of negative implications on the size of your indexes?

henrik16:08:47

@markbastian I've been looking at this for some attributes, but not all. Specifically those I want to enforce as unique throughout the DB, like email and URL.

favila16:08:14

yeah, this generally makes no sense unless the value has some kind of identity

favila16:08:20

(in your data model)

favila16:08:46

e.g. actors have identity independent of anything asserted of them

favila16:08:01

but the number "1959"?

favila16:08:12

or the string "Ben Hur"?

favila16:08:29

depends on the domain but I think usually not

favila16:08:41

alternatively, if you want to use entities with value-ish semantics (so they are shared-by-value) then they should have a unique attribute or some kind of hash-derived id

favila16:08:27

we use this technique as a kind of compression and to get around datomic not having custom value types

markbastian17:08:45

As a title or year, I would think these things do have identity.

markbastian17:08:31

The number 1959 wouldn't be particularly special. There are an effectively infinite number of them. But movie release years are limited. Less than 150.

markbastian17:08:22

And as a title, there are a limited number of works related to "Ben Hur" (one book, several movies, etc.)

markbastian17:08:23

In the year example, all of the references would have schemas along the lines of

{:db/ident       :year/value
 :db/valueType   :db.type/long
 :db/cardinality :db.cardinality/one
 :db/unique      :db.unique/identity}
in which they would exist uniquely in the domain.

markbastian17:08:38

In the above case I am presenting an extreme, but the idea is that you may have a relatively finite number of values from which all other entities are built. Some things, such as movie revenue, would definitely not fall into this category as they could be effectively anything.

Dustin Getz18:08:11

A value like 42 is its own identity, you don't need a second layer of identity on top of it

markbastian18:08:58

Hmmm, that makes a lot of sense. One thing I like about what I was doing was that I could do very fast queries along the lines of:

[:find [?e ...]
 :in $
 :where
 [?t :title/string "Ben Hur"]
 [?y :year/value 1942]
 [?e :movie/title ?t]
 [?e :movie/year ?y]]
As long as the set of titles and years were relatively small this will be quite fast. It should just be a set operation on the backreferences to the domain values. Essentially the domain values provide a gateway into the entities. If, on the other hand, I did something like this
{:movie/title "Ben Hur"
 :movie/year 1959
 :movie/actors [{:actor/name "Charlton Heston"}
                {:actor/name "Stephen Boyd"}]}
I would query with something like this:
[:find [?e ...]
 :in $
 :where
 [?e :movie/title "Ben Hur"]
 [?e :movie/year 1992]]
Wouldn't this option be dramatically slower for a large data set? It seems like I don't have a fast path to my movie entity. I don't really have a strong concept of identity. The best definition is probably "title+year". Any thoughts as to a better way to think about this?

favila20:08:06

"Wouldn't this option be dramatically slower?" No, quite the opposite. Your first option has twice as many joins in it

favila20:08:01

It feels like you are trying to optimize row size in a relational database

favila20:08:56

doing the same thing in datomic is usually going to increase storage and lookup times

favila20:08:06

(unless done carefully)

markbastian21:08:38

Yeah, I've read Tonsky's post several times. I actually get great performance with the first query and worse (but not bad) performance with the second when using Datomic. Not the case, though, with Datascript since it doesn't seem to have backreferences built in to the indexes. I do want to emphasize, though, that in the first model the title/string and year/value are references to unique identities. There is no concept of index or identity in the data model of the second query.

favila21:08:20

these attrs should all be indexed in either scenario

favila21:08:33

(indexed by value)

favila21:08:03

if the second one is not indexed at all then that is why it is slower

favila21:08:13

not because they share entities for their values

favila21:08:02

:movie/year 1959, if not indexed, will require a scan over :movie/year to get the matching value

favila21:08:35

if there is only one entity for any given :movie/year value obviously that will be a faster scan

favila21:08:03

but there's still a second lookup in the :vaet index to go from the movie-year entity to movies which reference it

favila21:08:28

(you are still indexing by value--the automatic entity backref index)

favila21:08:24

asserting :movie/year on the movie entity directly when the attr is indexed removes this extra lookup

favila21:08:45

now it is simply an index-range scan over :avet where value = 1959

favila21:08:57

and attr = :movie/year

favila21:08:11

the e of the movie will be known without an extra index lookup

markbastian18:08:34

BTW, I appreciate everyone's help on this. I've been trying to achieve "Datomic Enlightenment" for a while now and a few things, like establishing identity when there is no obvious primary key and a database function won't do, are still elusive for me. This was just something that I thought of that seemed to solve the problem of "weak identity". In other words, you know facts about something that, taken together, tell you exactly what you want, but the thing you want doesn't have a natural single ID.

markbastian19:08:42

Perhaps setting :db/index true on :movie/title and :movie/year would accomplish what I am going for without adding any additional concept of identity to what are otherwise primitive values.

ghadi19:08:28

I'm trying to enumerate tradeoffs on Datomic Ion placement. Our main production east1 account (A) is not the same as the Datomic Cloud east1 account (B). Assuming I need to consume a Kinesis Stream with an Ion-backed lambda, do I: Place the Stream in account A and the Lambda/Ion in acct B Place both Stream and Ion in account B, produce to the stream remotely from acct A

ghadi19:08:57

account A and B can never be the same because we have a legacy EC2 VPC