datomic 2018-07-12 | Slack Archive

euccastro04:07:01

I'm trying to make a ring app as an ion. I pushed and deployed an app that uses com.cemerick/friend (admittedly a bit of a stress test). I got the following error when curl -iing the gateway API endpoint:

HTTP/1.1 500 Internal Server Error
Date: Thu, 12 Jul 2018 04:22:05 GMT
Content-Type: application/json
Content-Length: 157
Connection: keep-alive
x-amzn-RequestId: 1f30cc2e-858b-11e8-ac8f-b9d295c18321
x-amz-apigw-id: J5aYVFrQliAFs7w=
X-Amzn-Trace-Id: Root=1-5b46d768-43f8e546292ea7321adbb5a0;Sampled=0

java.io.FileNotFoundException: Could not locate slingshot/slingshot__init.class or slingshot/slingshot.clj on classpath., compiling:(cemerick/friend.clj:1:1)

I have no such problem locally, and slingshot appears in the list of downloaded libraries that got printed when I first deployed:

 ... {:s3-zip "datomic/libs/mvn/slingshot/slingshot/0.10.2.zip", :local-dir "/home/es/.m2/repository/slingshot/slingshot/0.10.2", :local-zip "/home/es/.cognitect-s3-libs/.m2/repository/slingshot/slingshot/0.10.2.zip"} ...

euccastro04:07:48

this is my deps.edn, FWIW

{:paths ["src/clj" "resources"]
 :deps {com.datomic/ion {:mvn/version "0.9.7"}
        org.clojure/data.json {:mvn/version "0.2.6"}
        org.clojure/clojure {:mvn/version "1.9.0"}
        com.cemerick/friend {:mvn/version "0.2.3"}
        ring/ring-defaults {:mvn/version "0.3.2"}}
 :mvn/repos {"datomic-cloud" {:url ""}}
 :aliases
 {:dev {:extra-deps {com.datomic/client-cloud {:mvn/version "0.8.54"}
                     com.datomic/ion-dev {:mvn/version "0.9.160"}}}}}

henrik05:07:50

Is this section in the Datomic tutorial (https://docs.datomic.com/cloud/tutorial/assertion.html#sec-3) missing (d/transact conn {:tx-data (make-idents colors)})?

euccastro06:07:13

trying to use the session ring middleware with cookie storage seems to break the proxy integration:

Thu Jul 12 06:04:17 UTC 2018 : Endpoint response body before transformations: {"statusCode":200,"headers":{"Content-Type":"text\/plain","Set-Cookie":["ring-session=ECSI%2FAxqP4g3%2F6Lsf6j2gw6iTCd2jVL9CB2n8D%2BsBIY%3D--FweWg7tIHsIfkhtzoKxqC9YvJNtKEjzU%2BQtbF1Qzk20%3D;Path=\/;HttpOnly"]},"body":"T2zDoSAwIQ==","isBase64Encoded":true}
Thu Jul 12 06:04:17 UTC 2018 : Endpoint response headers: {X-Amz-Executed-Version=$LATEST, x-amzn-Remapped-Content-Length=0, Connection=keep-alive, x-amzn-RequestId=68d54df4-8599-11e8-a98d-17a42203bec1, Content-Length=254, Date=Thu, 12 Jul 2018 06:04:17 GMT, X-Amzn-Trace-Id=root=1-5b46ef61-1d46065e28b013d3ba616863;sampled=0, Content-Type=application/json}
Thu Jul 12 06:04:17 UTC 2018 : Execution failed due to configuration error: Malformed Lambda proxy response
Thu Jul 12 06:04:17 UTC 2018 : Method completed with status: 502

euccastro06:07:07

so just in case this bites someone: it seems like the AWS API gateway doesn't accept a list as a header value, and in general it doesn't accept multiple headers with the same name. a workaround, if you really need multiple headers with the same name, is to return the headers in different upper/lower case combinations (e.g., "Set-Cookie" and "sEt-cOOkiE" will work). you could write a ring middleware that does just that

👍 4

fmnoise10:07:11

any thoughts about excision for datomic cloud - is it even planned to implement?

stuarthalloway10:07:38

hi @U4BEW7F61. It is definitely on our radar. https://forum.datomic.com/t/support-for-excision-or-similar/323

henrik11:07:33

I want to model a taxonomy, like this one:

Biology --> Medicine -> Internal
        |-> Genetics
        |-> Morphology

Eventually, I want to tag stuff with this taxonomy, such as an article entity tagged with genetics for example. What would be a good way to model the taxonomy?

henrik11:07:00

This is my current attempt:

[{:db/ident :taxonomy/name
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/unique :db.unique/identity
  :db/doc "The title of a taxonomy node"}

 {:db/ident :taxonomy/children
  :db/valueType :db.type/ref
  :db/isComponent true
  :db/cardinality :db.cardinality/many
  :db/doc "Children of a taxonomy node"}]

henrik11:07:52

It works. I’m just not sure if it’s an intelligent way to do it.

chrisblom13:07:18

@henrik that looks reasonable to me

val_waeselynck13:07:06

@henrik if the taxonomy graph is tree-like, :taxonomy/parent instead of :taxonomy/children is probably safer

henrik13:07:29

@val_waeselynck Interesting! How is that safer?

val_waeselynck13:07:34

Well, by having a cardinality-one attribute, you're being more explicit about the model ("a taxonomy has at most one parent")

👍 8

val_waeselynck13:07:16

Also seems more reasonable to me that the parents, being more general, don't "know" about their children

jonahbenton13:07:34

@henrik Relatedly, do you need to be able to navigate up the tree from child to parent? And is it possible for the taxonomy to be rich enough for there to be the same or similar names in different parts of the tree?

henrik14:07:26

@jonahbenton Every node, regardless of level, should be entirely unique. Or, if it’s named the same, it is the same. And yes, navigation would have to be bidirectional. But as I understand Datomic, all references are bidirectional, right?

val_waeselynck14:07:18

they are, in the sense that you can easily navigate in both directions, whatever the query API you're using

👍 4

jonahbenton14:07:36

So a given node may have multiple parents?

henrik14:07:57

Oh, I see. No, one parent I think.

henrik14:07:11

This is for categorising science into fields and subfields. Though now you got me thinking about whether modeling it as a network of subjects would be more powerful.

jonahbenton14:07:54

Yeah, probably would, though seems like it might depend on the size and the dataset feeding the categorization. Tags may be a useful modeling tool to capture commonalities (like computational-ness of the subfield) Perhaps also include a description attribute

👍 4

henrik14:07:17

Right now, I’m looking at basing it on a standard way of categorising (CWTS Leiden, about 250 categories and subcategories). But just because that particular model is hierarchical doesn’t mean that there isn’t a more powerful way to do it. The point with this particular taxonomy is to try to keep it small(-ish), using it to create rather large, but interconnected groups of material.

henrik14:07:56

I could essentially model a freer graph in the same way, right? Renaming parent to something like relation.

jonahbenton14:07:00

Ah, that sounds neat. It sounds like datomic as a metadata store- this taxonomy applied to source material that lives outside datomic- which I'm thinking about for a project as well.

jonahbenton14:07:11

Yes, I believe so, I have seen some "node" "edge" terminology in schemas

henrik14:07:27

Yeah, the source material would come from scientific publishers, in the form of articles, journals, books etc. And we have to find a way (many ways, actually), to tie all that disparate information together into a cohesive, consumable collection.

henrik14:07:37

What type of material will you be working with?

henrik14:07:24

Actually, with edges/nodes, I’m back to a list of relatives, though. Just not necessarily parents.

jonahbenton15:07:20

That sounds neat! Lots of interesting problems there. For me, as a side project, I'm looking at reimplementing a container artifact metadata api. The api is from a project called Grafeas: https://grafeas.io/ which acts as a metadata repository around container usage, vulnerabilities, deployment history, stuff of that nature. The basic technical idea is that grafeas is one of many projects in the container ecosystem that are glorified packagings of go code generated from protobufs. I like go, but when it comes to code generation, it's an awkward workflow, and the go people argue about checking code into the repo, doing it at build time, yadda yadda. It seems to me that in the clj space, you should have a pretty clean workflow of generating schema and data models from protobuf for the different layers -> spec, apis, datomic schema- and that should be sufficient to yield something of a working system. I don't see any of that tooling right now, so that's what I'm looking at.

henrik15:07:39

Could you summarize the problem and the value proposition for me? I don’t think I’m familiar enough with the problem to fully understand the solution.

jonahbenton16:07:18

Kind of you to ask, it's niche, so the explanation is a little long: Companies/orgs that run applications- api-type services and scheduled/batch jobs- have been "containerizing" their applications. Once you have containerized, there are a whole set of questions you'd like to be able to ask about your fleet, some operational, some security related, etc. Do any of the jvm applications I'm running use the vulnerable version of struts? If so, where are they in my network and for how long have they been running? How many of my applications have had vulnerabilities reported against their dependencies? What third party libraries are my service applications consuming, and are any of those licenses GPLV3? In even a small plant you wind up wanting to have a metadata repository into which that sort of operational and security data can be pushed, and against which one can run queries. Beyond that, you want to be able to plug other consumers and providers into that repository. You want to be able to use vulnerability scanner X and build tool Y and signing tool Z, and Google has succeeded in getting commitments for adoption of this particular metadata API by various players in this ecosystem.

jonahbenton16:07:01

I'm curious about this as a side project, as I do some work in security and have been enthralled with containers and kubernetes. From a product standpoint, it seems like Datomic should be a good fit for this sort of metadata, both for storage and for query. Having a fundamentally immutable store that knows-when-you-knew-something is useful for security, and datalog is more capable than many other languages from a query perspective. On a technical level, I'm curious about the ergonomics of going from protobuf->spec, protobuf->api, protobuf->datomic schema, and am curious about data-driven systems in general. There is a project called "vase" from the Cognitect folks which was an experiment in building a fully data-driven api + database. Write as little code as possible, describe the system entirely using data, how far can you go with that? So on a technical level I'm basically curious whether protobuf is a feasible "front end" with vase as a "back end".

henrik22:07:43

Thank you for the description, that does like an interesting (and hard) problem. I can see how managing tons of containers quickly takes on qualities of cat herding. I remember the Vase introduction from a Cognicast way back. “Because it sits on top of Pedestal.” In the more abstract, it’s interesting to try to imagine how to keep some of the ergonomics of Clojure once you pass the border of the application. Philosophically, a function and a container have sort of morphological similarities, but the environment is as different as that of a one-cell organism to that of an animal.

jonahbenton18:07:23

Agree! Very interesting. Working in clj on applications that will get deployed into k8s, one can't avoid engaging in thought experiments about a repl that directly creates and interacts with k8s resources in a first class manner. The repl and kubectl are equivalent levels of abstraction. One can imagine having a way to produce a pseudo clj namespace from a container image + a swagger spec, so loading that namespace under the hood spins up a container, and calling functions turns into (cross-language) service calls. Certainly we've seen movies like this before; when abstractions are similar but not equivalent the pain is often greater than the benefit. But still interesting to think about.

rhansen14:07:15

Hmm... I have a list of references, and I want to check if those references all belong to a certain entity. What would be the best way to construct such a query?

val_waeselynck14:07:26

what does it mean for a reference to belong to an entity?

rhansen14:07:08

[?entity :person/friends ?some-ref]

val_waeselynck14:07:59

@rhansen I would use a Datalog query to list or count those that don't

rhansen14:07:34

Interesting. Thanks.

euccastro14:07:45

I've done the ring wrapper I mentioned above. it's only tested in the REPL (and by deploying to ions, of course) so far, but I hope it's useful if you're tinkering with hosting a ring web app in ions: https://github.com/euccastro/expand-headers

👍 4

val_waeselynck14:07:34

@euccastro sorry, I don't follow what problem you are addressing?

euccastro14:07:47

@val_waeselynck are you talking about my response to you or about the github repo I mention above?

val_waeselynck15:07:36

@euccastro my response to you

euccastro15:07:20

oh sorry I think I misunderstood your question to @rhansen

euccastro15:07:02

I've deleted my responses since they only add noise

val_waeselynck15:07:25

ah ok 🙂

val_waeselynck15:07:57

debugging human conversations

euccastro17:07:00

FWIW, the problem I mention here (https://clojurians.slack.com/archives/C03RZMDSH/p1531369681000042) persists if I manually add to my own deps.edn a dependency on the same slingshot version as cemerick.friend does (0.10.2), but for whatever reason it doesn't manifest if I upgrade the slingshot dependency to the current version, 0.12.2

oscar17:07:34

@euccastro Upgrade to the newest Ions. It sounds like you have dependency conflicts. https://docs.datomic.com/cloud/ions/ions-reference.html#dependency-conflicts

euccastro19:07:06

thanks @oscar!

stuarthalloway19:07:35

Hi @euccastro! If that does not work you should be able to spot an error in the logs, per https://docs.datomic.com/cloud/operation/monitoring.html#searching-cloudwatch-logs

euccastro19:07:13

thanks @stuarthalloway! I just noticed I'd missed that whole "Operation" section of the docs 😛

😀 4

sho03:07:40

#Also sent to the channel

Hi @euccastro, have you managed to create a ring app as an ion? Does it work just fine with your hack for the headers problem? I'm just trying to do the same exercise and curious what to expect.

euccastro04:07:19

so far it works fine. as you may have seen in the #datomic channel, I have stumbled into some dependency problems too, but so far I'm managing by paying attention the first time I push a version that introduces a dependency and manually declaring any conflicting dependencies

euccastro04:07:39

see this (not ions specific) for how to associate a domain name to your API Gateway app: https://docs.aws.amazon.com/apigateway/latest/developerguide/how-to-custom-domains.html

euccastro04:07:14

also, if you want to be able to serve the root (/) directory, you need an additional ANY method in the root (/) resource of your API Gateway. the ions tutorial doesn't get into that. you shouldn't remove the /{proxy+} resource, though. AFAICT both are needed

euccastro04:07:01

all that said, I haven't tested much functionality yet, only that basic ring handlers work

euccastro04:07:41

google "keep aws lambda warm" for another important consideration if your app is user-facing or otherwise latency sensitive

euccastro04:07:27

the good thing about these hoops is that you only need to jump through them once I think. I haven't touched my API Gateway configuration at all since I initially set it up, and I don't expect to have to worry much about it

euccastro05:07:24

https://datomique.icbink.org where I'm testing these things. that is backed by ions (solo deployment). the counter (refresh the page) is kept in the cookies, and the list of accessed paths is kept in a local atom (note that any process-local state gets lost on deployments though)

euccastro05:07:41

this is my ring handler ATM FWIW:

(def log (atom []))

(defn ring-handler
  [{:keys [headers body uri params session]}]
  (if (= uri "/favicon.ico")
    {:status 404
     :body "Not found!"}
    (do
      (swap! log conj uri)
      (let [count (get session :counter 0)]
        {:status 200
         :headers {"Content-Type" "text/plain"
                   "p-ro-va-heaDers" ["a" "b" "c" "d" "e"]}
         :body (str "Olá " count "-" (pr-str @log) "!")
         :session (assoc session :counter (inc count))}))))

(defn dup [xs]
  (conj xs (first xs)))

(defn wrap-add-cookie [handler]
  (fn [req]
    (update-in (handler req) [:headers "Set-Cookie"] dup)))

(def ring-app
  (-> ring-handler
      (wrap-session {:store (cookie-store {:key "a 16-byte secret"})})
      wrap-keyword-params
      wrap-params
      wrap-add-cookie
      wrap-expand-headers))

euccastro05:07:32

as you see I've been mostly tinkering with multiple header values and ring handlers, not doing anything fancy yet

euccastro05:07:10

I'm pushing my experiments here if you're interested (ignore the /old folder): https://github.com/euccastro/semente

sho06:07:52

Sorry I've been offline for lunch. All of your information is very helpful, especially because I haven't found anyone else doing the same stuff yet.

sho06:07:22

I'm still not 100% convinced whether the approach of building a ring handler behind API Gateway is the best decision for me, but the alternative would be doing auth with AWS Cognito, which means throwing away a good chunk of Clojure code and moving away from the Clojure ecosystem.

sho06:07:57

So I want to first try my server-side code with Buddy auth as a ring ion.

sho07:07:15

About java cold start, I'm thinking about dispatching an event to knock the ion app right at the moment users visit my static site on CloudFront and having one ion handle all of my api requests that requires both authentication and authorization. Not sure if this is a good strategy, but I plan to try it and examine the latency problem with my eyes.

sho07:07:16

I'll be out for a few days, but if I happen to find anything valuable, I'll ping you and share the info. Cheers.

euccastro21:07:16

thanks!

euccastro19:07:56

(btw it did work)

eggsyntax21:07:35

Is anyone aware of any writing or documentation out there about guarding against malicious datomic queries, especially preventing queries with too great a performance impact? I don't think it makes sense to naively expose queries entirely to the public (or semi-public in my case, ie logged-in users, with only signups vetted). But I'm interested in seeing what's been written on the subject. Didn't find anything relevant on a quick review of the datomic docs.

timgilbert21:07:59

I've thought about this a lot, but never found much in the way of writing on the subject. In general the problems are similar to problems that other graph databases also face. But there's not tons of general literature available for those either.

timgilbert21:07:28

At my company we did go through an exercise of parsing pull queries and then limiting specific queries to a certain depth and doing other validations on them

eggsyntax21:07:05

Thanks, Tim! Any particular tips/gotchas on that process?

timgilbert21:07:27

But we eventually moved to keeping all the queries on the server where we could control them, and then moving to a GraphQL interface which has its own set of issues

eggsyntax21:07:57

Heh. We've been doing some exploration on a new project, and I had put off making DB decisions. I added GraphQL so I could support client-side "pull"-specification. Now that I've decided to go with datomic, I'm dropping GQL like a hot potato 😉

timgilbert21:07:05

One thing that we ran into a bunch was trying to figure out how to guard against attacks where a user is able to escape her own company and start getting data about another person's company by backref-linking through a shared entity

eggsyntax21:07:47

It doesn't seem like GQL really provides any inherent support for limiting query specification impact either, seems like you're left facing the same problem.

eggsyntax21:07:56

But not the backref aspect I guess, huh?

timgilbert21:07:59

If you decide to expose some of your datomic stuff via lacinia, we open-sourced a library that does some of the grunt work for you: https://github.com/workframers/stillsuit

eggsyntax21:07:08

Hmm, seems like one option (for datomic) would be to parse the pull and look for backrefs, and then just reject any calls that had them.

timgilbert21:07:56

Yeah, except in cases where you actually need them, like you have a project and are looking for all users with :person/project ?p

timgilbert21:07:37

Anyhow, we thought about it for a while and eventually decided keeping the queries on the front-end was going to be a black hole of engineering time

timgilbert21:07:36

I think there are ways you could work around it, like have a "dev mode" where the client sends them over and a "prod mode" where they are replaced by keywords or something

eggsyntax21:07:43

Yeah, I can definitely see the possibility of it becoming a terrible timesuck. The keyword approach seemed promising to me too. This is a bunch of really useful info for me. May save me from going down some wrong roads. I really appreciate it :man-bowing:

timgilbert21:07:03

We were also thinking about moving to a multi-tenant setup where user data from different orgs was stored in entirely separate databases, which would have been easier to do on day 1 than day 638 or whatever

eggsyntax21:07:26

Ah, yeah, no doubt.

timgilbert21:07:56

No prob. I'd say definitely give it some thought, you might stumble on something we didn't, and I'll look forward to reading your blog post about it 😉

eggsyntax21:07:16

Seems like maybe writing your schema explicitly to avoid the need for backrefs in client requests might work, although I'm not at all sure of that. Or maybe you could take an approach like disallowing certain things like backrefs, but being able to pass keywords that tell the server to include datomic rules that provide just the backrefs that you need.

eggsyntax21:07:07

ie hide the potentially dangerous stuff behind keywords and disallow it in client requests, but then expose the full range of non-disallowed stuff for the client, for the sake of power.

timgilbert21:07:34

It's possible, yeah. Starting with a subset seems like a promising approach, or maybe a query DSL that you could validate and then translate back into pull syntax on the server side

👍 4

eggsyntax21:07:25

(this is re: on-prem / peer, btw)

jonahbenton22:07:30

For reads, in terms of constraining cpu/ram resource utilization- in the peer architecture, the query processing is happening wholly in your app, so this is under your control. You can give inbound requests as much or as little time as you want on a thread, then cancel; or retrieve only a limited number of results, or whatever...

eggsyntax22:07:52

For sure! I'm just wondering if there are some examples of approaches that people have taken to that, that may bring up datomic-specific considerations that I have thought of.

sho03:07:40

replied to a thread:thanks @stuarthalloway! I just noticed I'd missed that whole "Operation" section of the docs :stuck_out_tongue:

Hi @euccastro, have you managed to create a ring app as an ion? Does it work just fine with your hack for the headers problem? I'm just trying to do the same exercise and curious what to expect.

2018-07-12

Channels