This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-04-24
Channels
- # aws-lambda (1)
- # beginners (99)
- # boot (46)
- # cider (8)
- # cljs-dev (20)
- # cljsrn (37)
- # clojure (189)
- # clojure-dev (22)
- # clojure-dusseldorf (28)
- # clojure-italy (1)
- # clojure-russia (28)
- # clojure-spec (10)
- # clojure-uk (33)
- # clojurebridge (1)
- # clojurescript (64)
- # core-matrix (2)
- # css (3)
- # cursive (3)
- # datascript (34)
- # datomic (101)
- # defnpodcast (2)
- # dirac (5)
- # events (1)
- # funcool (3)
- # ldnclj (1)
- # lumo (11)
- # mount (1)
- # off-topic (95)
- # pedestal (2)
- # perun (10)
- # re-frame (3)
- # reagent (6)
- # ring-swagger (4)
- # specter (102)
- # test-check (1)
- # untangled (1)
- # vim (8)
- # yada (17)
OK, so I have a weird one... (I am new to Datomic and may be just making a fool of myself, but hey-ho). I created a Datomic, in-memory DB from a vector of 23 maps, having defined the following schema:
(def meteorological-observation-stations-schema
[{:db/ident :station/id
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/doc "The Unique Identifier for the monitoring station"}
{:db/ident :station/name
:db/valueType :db.type/string
:db/cardinality :db.cardinality/one
:db/doc "Long name for the monitoring station"}
{:db/ident :station/elevation
:db/valueType :db.type/double
:db/cardinality :db.cardinality/one
:db/doc "The Elevation at which the monitoring station is placed in metres."}
{:db/ident :station/latitude
:db/valueType :db.type/double
:db/cardinality :db.cardinality/one
:db/doc "Latitude for the monitoring station"}
{:db/ident :station/longitude
:db/valueType :db.type/double
:db/cardinality :db.cardinality/one
:db/doc "Longitude for the monitoring station"}])
but, as far as I can tell the 23 maps in that ^^ format have created 161 nodes. I wrote a simple datalog query to check that the nodes had been created correctly and it comes back with 161 results and almost none of them are correct 😞I've visually checked my input (CIDER + Emacs ctl+x-e) so I know that I am putting the correct data into the d/transact...
@maleghast when you say "nodes" do you mean "entities"? What query did you use?
Hold on, will paste:
(def stations-query '[:find ?station-name ?elevation
:where [_ :station/name ?station-name]
[_ :station/elevation ?elevation]
[(> ?elevation 200)]])
I did wonder if the schema was creating an entity for each definition, but that would be 115, not 161 - 161 is (* 23 7)
What I am getting back is 7 "answers" per station, and 6 of the values for elevation are wrong.
I am fairly certain that I am simply missing something about how to write the query, tbh, but after a day's worth of banging my head against it I thought that I ought to just ask... 😉
Here's the data I am pushing in after I've added / created the schema: https://pastebin.com/0QCLBj9b (Side-note, when did refheap,com stop working..?)
:where [_ :station/name ?station-name]
[_ :station/elevation ?elevation]
you want names for stations with elevations > 20 is this correct?
I think you should join them like this
:where [?id :station/name ?station-name]
[?id :station/elevation ?elevation]
@kirill.salykin - OK, thanks, I will try that.
@maleghast To expand on what @kirill.salykin said. I think you want to make sure that the elevation & the station entity come from the same entity. If you don't you will get station^2 number of entities (- a few where the elevations aren't high enough)
@kirill.salykin - That works, but I need to understand how to write a query that does the above but then also only returns the entities with an elevation greater-than 200
That's the query that @kirill.salykin suggested I think
Nope, if I do include it I get an error about not being able to resolve the symbol ?elevation
Nope, I was being an idiot - forgot to put the s-expression for the predicate inside a vector. *facepalm *
Do either of you have a strong recommendation for learning Datalog - I found the Cognitect docs / tutorial to be less than optimal... 😉
this is pretty good
@kirill.salykin - Thanks very much; will have a look now(ish)
http://docs.datomic.com/best-practices.html#sec-14 I think the pre-processor got confused here. Not sure who to ping about it? 😄
Can anyone from Cognitect comment on the following line Note that large caches can cause GC issues, so there is a tradeoff here.
(from http://docs.datomic.com/capacity.html)
we’re thinking about trying a big Object Cache to keep most of our data locally on our peers
we can profile this ourselves, I’m just wondering if there’s anything obvious that I should be thinking about here
@kschrader how big is big?
One thing to consider is you could also use some proportion of the box memory for a local memcached instance
well, I definitely know folks running 16G in prod. You just want to make sure to keep any eye on your system and ensure you’re not getting into GC hell
(d/transact conn [[:db.fn/retractEntity (d/tempid :db.part/user)]])
dont thows error... Can I use it?
@kschrader i meant a 16g heap, half of it ocache you can run memcached on the same instance
for example, I’m running a test right now on an m4.2xlarge, which has 32G of memory (IIRC); My jvm has an 8g heap and i’ve got a 22GB memcached instance running on it
then i configure the memcached endpoint in both the peer and transactor to that instance address & the memcached port
and I don’t think that there’s a dynamic way to update the memcached config on the transactor
you just wouldn’t get the txor pushing new segments to your peer local memcached instance
when I profile locally in a memory constrained environment I see a lot of time spent in org.apache.http.protocol.HttpRequestExecutor.doReceiveResponse
and com.fasterxml.jackson.core.json.UTF8StreamJsonParser.getText
and once the cache is warmed the response time is about 40x faster for the queries that I’m profiling with
(which obviously isn’t the same load as our production infrastructure, but it’s clearly something)
we also see a bunch of time spent in java.io.BufferedInputStream.read
, java.io.DataInputStream.readFully
, and java.io.DataInputStream.readInt
@marshall @jaret any way we can configure a longer timeout for S3 restores?
Copied 0 segments, skipped 128128 segments.
Copied 0 segments, skipped 128128 segments.
Copied 0 segments, skipped 128128 segments.
java.util.concurrent.ExecutionException: java.net.SocketTimeoutException: Read timed out
these are becoming tiresome to retry-a-thon our way through
this is when we restore to a dev machine
@kschrader that may or may not be cache ‘churn’ - it is potentially just the cost of reading a ton of data
@robert-stuttaford and @kschrader I don’t believe that is configurable currently - I’d suggest adding it as a feature request
unlikely to be churn then is it possible you’re under memory pressure from your app?
I think that we’ll try to bring up another cluster with 16GB heaps and see what happens
and have a reasonably scaled compute with it - you don’t want to have a huge heap with i.e. a 1 or 2 core processor
@marshall i hope you give substantial weight to each vote on http://Receptive.io because each one counts for an organisation which represents many people 🙂
@robert-stuttaford We are absolutely considering organizational weight and power users when looking at our feature request feedback