datomic 2016-01-02 | Slack Archive

jamesnvc01:01:43

Hello, I’m having an issue trying to add a transaction function; when it tries to add, I get “Can’t embed object in code, maybe print-dup not defined: clojure.lang.Delay"

jamesnvc01:01:53

fn looks like

{:db/ident :add-user
       :db/id #db/id [:db.part/user]
       :db/fn #db/fn {:lang "clojure"
                      :params [db params]
                      :code (pr-str
                              '(if-let [e (datomic.api/entity db [:user/email (:user/email params)])]
                                 (throw (Exception. "User already exists with email"))
                                 [params]))}}

jamesnvc01:01:57

alternatively to fixing that, is there a way to tell datomic that I don’t want to upsert? I’m using this function because I want to ensure that users have unique email addresses, but if I use a tempid when I insert the new user, duplicates result in updating the user with the existing email, instead of throwing an error

jamesnvc01:01:20

oh, never mind the fn question; apparently d/function works instead of the reader macro

jamesnvc01:01:10

I would still be interested to know if there’s a better way to do this and let datomic’s unique checking do the job for me

Ben Kamphaus02:01:47

@jamesnvc: use unique/value instead of identity. Unique/value throws, unique/identity upserts http://docs.datomic.com/identity.html#unique-values

jamesnvc03:01:30

@bkamphaus: oh, thanks! I think I was misunderstanding what unique/value meant

sparkofreason17:01:44

Anybody know of a converter from JSON schema to Datomic schema? Grasping at straws here, Google turned up nothing.

curtosis18:01:28

@dave.dixon: haven't seen one, but I'd be interested in a JSON-to-Datomic data converter if you've seen one. 😉

curtosis18:01:23

(which i suppose is really json-edn...)

sparkofreason18:01:36

Is there some reason data.json doesn't work? https://github.com/clojure/data.json

sparkofreason18:01:06

That's what I was planning to use to read the JSON schema into Clojure maps etc.

curtosis18:01:29

haven't tried it... didn't get that far yet. lol

alandipert18:01:56

i dabbled in putting arbitrary graphs into datomic with https://github.com/tailrecursion/monocopy, concept was flawed though

curtosis18:01:15

I'm just in the process of rethinking my planned architecture... I wanted datomic on AWS but I think it'll just be too expensive.

curtosis18:01:55

Instead now I'm considering dumping the raw data into DynamoDB using Lambda, then extracting it into Datomic for the real work.

alandipert18:01:29

if you want to query maps in datomic datalog (not store them) you can circumvent the structural identity problem with an approach like https://twitter.com/alandipert/status/682597011141558273

curtosis18:01:04

(run-time is mostly data collection with some lookups; "real" work happens periodically -- as in monthly/annually and is offline)

alandipert18:01:11

@curtosis: what kind of data? would you characterize your work as "analytic"?

curtosis18:01:06

The data is essentially scores submitted by judges. The core entity is an event: {:ballots [{:judge judgeid :category score}]} (simplifying greatly .. it's actually 4-5 layers deep)

curtosis18:01:55

it's less "analytic" than just "tabulation". There are a bunch of rules for how scores get averaged/dropped/qualified etc.

curtosis18:01:26

and it seems dramatically easier to do it in datalog than SQL.

curtosis19:01:22

I can't really justify the ~$300 to keep a t2.medium instance running all year.

curtosis19:01:17

(thus the workaround... I'd prefer to stay in Datomic from the outset. Audit trail is, as you might expect, rather important here.)

alandipert19:01:06

interesting

alandipert19:01:53

do already have a SQL db where you store the collated results?

curtosis19:01:48

hmmm... I could also flip it around and use DynamoDB directly for the live lookup stuff (name regularization etc) and put the scores on SQS, fire up Datomic once a day or so to consume the queue and update the names list....

curtosis19:01:01

I have a SQL db currently with the raw scores

curtosis19:01:20

(and clojure code now to map it into Datomic)

curtosis19:01:21

the grand impetus for all of this is that a) the inputs look like documents more than tables, b) rails stack consistency over time is a tire fire, and c) I really like Datomic.

alandipert19:01:15

i'm in the ad business and we store "events" like clicks and impressions in in S3, then EMR them periodically and put the aggregates in a combination dynamo and redshift

alandipert19:01:50

seems like in your case, if your real-time query requirements are light, the most economical thing would be to aggregate somewhere cheap and "wake up" periodically to process

curtosis19:01:59

are they more like flat structures when they go into S3?

alandipert19:01:13

newline-delimited json maps

curtosis19:01:31

yeah, that's what I'm thinking. And the free tier of Dynamo is probably plenty sufficient for that "cheap" aggregation.

alandipert19:01:33

mostly flat but decorated with geo and other info, maybe 3-4 things deep in spots

curtosis19:01:43

ah, ok.

curtosis19:01:34

S3 would be simpler (no Dynamo schema to care about) but there's no long-term free tier

curtosis19:01:46

waking up once a day is more than sufficient, and would run $20/year. way better.

curtosis19:01:31

OTOH, I'm also being silly... S3 is like $0.50/year for this use case.

curtosis19:01:45

the only gotcha is the name-lookup service backing... easy to do in Dynamo, harder to do in S3.

curtosis19:01:50

but regardless, I think this discussion is super helpful... I think I've reduced the problem now to a simple recurring load process, with a feedback to the name lookup stuff.

curtosis19:01:11

hmm.. for that matter, just dumping the records onto SQS for the instance to pick up when it wakes up might work too.

curtosis19:01:47

how do you keep track of which ones you've processed out of S3?

alandipert19:01:14

@curtosis: dynamo

alandipert19:01:59

well, there are a few "stages"

alandipert19:01:20

the first stage is gathering up files from S3 into batches and EMRing them... that's where we use dynamo, to keep track of the files/batches

alandipert19:01:50

when EMR is done it puts a batch id on an SQS queue... where a thing that specializes in loading aggregated data into Redshift picks it up

curtosis19:01:10

spiffy

curtosis19:01:49

and so the first stage just looks at S3 for everything newer than its oldest batch?

alandipert19:01:20

that's one way to do it... another is to attach a lambda function to S3 events

alandipert19:01:34

but we do neither, we have a convention for storing in S3 and make a date segment out of the key

alandipert19:01:55

eg 2015/10/2/3/0 for "the 0-30 minutes of 3am on 10-20-2015"

alandipert19:01:16

so when the EMR wakes up it figures out what the previous segment path was, and scoops the files up there

curtosis19:01:00

fair enough. I kind of like the idea of a lambda on S3 events putting something on SQS for my Datomic "core" to pick up.

curtosis19:01:44

but that may be more complicated than necessary... I can just get my last-pulled from Datomic and go from there.

curtosis19:01:28

and, since I control the front-end, I can just dump EDN into S3 for Datomic to load.

curtosis19:01:38

skipping the JSON stage entirely

curtosis19:01:33

and I think if it's just EDN I can use #db/id[db.part/user] to defer tempid generation until it gets picked up

curtosis19:01:51

\o/

alandipert19:01:48

sounds pretty sweet

curtosis19:01:54

the only hard part (for some values of hard) is the "I don't know this name; create a new one".

curtosis19:01:14

I very much like that there's no actual server I have to write.

alandipert19:01:53

oh you mean like make a new name entity?

curtosis19:01:37

yeah... I need to create it with a tempid on the client, process it into Datomic, and then update the front-end lookup service.

curtosis20:01:04

hmm... I guess I need to decide whether to build my own all-in-one transactor + processor AMI, or use the default Datomic AWS deploy template plus my application-code AMI.

curtosis20:01:51

apart from the obvious cost difference, are there any advantages to running one way or the other?

casperc21:01:27

So I am trying to programmatically generate a query, or at least some of one, but I am coming up short. I am wondering if anyone can help me.

casperc21:01:00

I want to make a function to which the db-ident to join on is being passed, so the called can chose the entity that is being joined. I have something like this:

casperc21:01:03

(defn get-all-entities-with-tag [tag-title attr]
  (let [db (d/db @conn)
        eids (map first (d/q '[:find ?e ?log-title
                              :in $ ?tag-title
                              :where 
                              [?tag-e :tag/title ?tag-title]
                              [?e ~attr ?tag-e]
                              [?e :log/title ?log-title]]
                            db
                            tag-title))]
    eids))

casperc21:01:03

This results in an exception though: IllegalArgumentExceptionInfo :db.error/not-an-entity Unable to resolve entity: clojure.core/unquote

casperc21:01:07

So the unquote isn’t doing the trick and it doesn’t work without it either. Any idea what will do the trick?

alandipert21:01:08

the problem is you're using ~ inside a regular quote, not a syntax quote ' vs \`

alandipert21:01:20

err `

casperc21:01:26

Hmm ok, so should I use a syntax quote? I am trying to figure out the right way to generate a query like this programmatically

alandipert21:01:40

unfortunately clojure's native syntax quote isn't a great fit either because it will try to resolve in namespaces things like ?log-title

alandipert21:01:02

i recommend checking out the template macro in https://github.com/brandonbloom/backtick

alandipert21:01:34

user=> (let [a 1 b `(~a 2 ?log-title)] b)
(1 2 user/?log-title)

casperc21:01:21

Ok thanks I will.

alandipert21:01:28

your other option is to leverage the fact that the query is a vector... so you can use update on it

alandipert21:01:50

but that would be kind of brittle, as you'd need to maintain the index of what inside you want to change

alandipert21:01:08

user=> (assoc [1 2 3] 1 "hi")
[1 "hi" 3]

casperc21:01:32

Ah yeah, but normal list operations might be the way to go, e.g. concat

alandipert21:01:06

true, that's maybe the best

alandipert21:01:17

iirc q doesn't care if you use vectors

casperc21:01:51

It’s just a bit bulky tbh, just trying it now

alandipert21:01:57

personally i prefer the template route, i find it the data analog to string interpolation

casperc21:01:00

Yeah it seems better. Just wierd that there is no good way to do it built into Clojure or Datomic

casperc21:01:10

especially since datalog is supposed to be easier to generate programmatically than string based queries like SQL

alandipert21:01:04

yeah, i don't really buy that 😉

alandipert21:01:41

i dunno tho, sql is filled with weird syntax that needs to be satisfied

alandipert21:01:51

fortunately datalog requirements are few in comparison

casperc21:01:23

I did, until I tried it just now and getting fucked by having to quote the vector to avoid symbols being resolved and subsequently can’t really do stuff with it 😬

casperc21:01:38

But thanks for the hint, I’ll give Backtick a go and stop complaining

alandipert21:01:12

hehe, sure

alandipert21:01:19

you can also always using strings again if you want

alandipert21:01:12

(let [x 1] (read-string (str "[" x " 2 3]"))) 🍷

casperc21:01:35

Yeah, that is sort of what they suggest in the docs (http://docs.datomic.com/data-structure-literals.html), but I just feel like there should be a better way in Clojure

alandipert21:01:30

oh wait, can you not pass in ~attr as a parameter? like the way you pass in tag-title

casperc21:01:04

Hmm, how do you mean?

alandipert21:01:23

(d/q '[:find ?e ?log-title
                              :in $ ?attr ?tag-title
                              :where 
                              [?tag-e :tag/title ?tag-title]
                              [?e ?attr ?tag-e]
                              [?e :log/title ?log-title]]
                            db
                            attr
                            tag-title)

casperc21:01:24

It is already an input param for the function

casperc21:01:40

ah i get it

casperc21:01:51

doh, that might just be it 😄

alandipert21:01:20

phew 😅

casperc21:01:02

Yup, that’s the ticket

casperc21:01:16

I am officially a dummy 😄

pesterhazy21:01:06

yup, that's datalog

casperc21:01:54

So maybe I am getting hung up on pointless little things, but I also want it to look for any ident via _ (underscore) when the attribute ident isn’t passed to the function

casperc21:01:02

(defn get-all-entities-with-tag [tag-title & [attr]]
  (let [db (d/db @conn)
        join-attr (or attr '_)
        eids (map first (d/q '[:find ?e ?log-title
                              :in $ ?tag-title ?attr
                              :where 
                              [?tag-e :tag/title ?tag-title]
                              [?e ?attr ?tag-e]
                              [?e :log/title ?log-title]]
                            db
                            tag-title
                            join-attr))]
    eids))

casperc21:01:24

But that way is not working

casperc21:01:41

Giving this error: IllegalArgumentExceptionInfo :db.error/not-an-entity Unable to resolve entity: _

alandipert21:01:47

if you don't have an attr... you could omit the whole [?e ?attr ?tag-e] clause, right?

casperc21:01:17

So I guess it is trying to resolve the entity

casperc21:01:17

True, it isn’t needed in that case. But I still don’t know how to operate on the query programmatically, so I don’t know how to remove it

alandipert21:01:40

yeah - i think template comes back

casperc21:01:47

Hehe yeah

alandipert22:01:04

btw you may consider keeping your queries in functions that take db as an argument

alandipert22:01:56

this gives you more control as the code evolves, since you don't have to coordinate conn access

casperc22:01:56

Yeah, thanks. I will. This is just be messing around with the REPL at the moment.

casperc22:01:32

I guess you would generally make a db at beginning of a route (if exposing a service) and operate on that throughout the call

alandipert22:01:52

yeah, i think anywhere you want to do a bunch of queries and get consistent results

2016-01-02

Channels