clojure 2019-12-10 | Slack Archive

Hello Channell:wave: How many here use Datomic? What are its pros and cons with respect to other DBs?

Cons: its closed, its not cheap, its awesome.so you think many times to don't care about 1 and 2

@thegobinath I think the pros are pretty well laid out on the Datomic website. Not sure about the cons. When I've talked to Cognitect about suitability for where I work, they raised a couple of flags which I'd say were cons: we have some data sets that are very high write throughput and with Datomic's single transactor and replication architecture that isn't a great fit; we also have some extremely large data sets and Datomic has some upper limits on the amount of data it can work with (I don't know whether those are hard limits or just effective performance limits).

kosengan03:12:38

sure 🙂

seancorfield02:12:23

There's a #datomic channel where you can find people who are using Datomic in production who'll probably have some real world feedback on pros and cons.

Alex Miller (Clojure team)03:12:59

Cloud does not have a single transactor btw

Alex Miller (Clojure team)03:12:29

Depending on your level of seriousness, you can also contact <mailto:[email protected]|[email protected]> for a conversation

frozenlock03:12:39

I think I encountered this bug while playing with generators. https://clojure.atlassian.net/browse/CLJ-2311 😞

seancorfield03:12:44

@alexmiller Ah, good to know. Thank you for the correction! I haven't really looked at Datomic much since we established it wasn't a good fit for us -- and that was a while back (before Datomic Cloud even launched).

johnj04:12:36

only one transactor can write to a database, you can't have multiple transactors talking to the same database simultaneously.

erwinrooijakkers15:12:01

We are using promises and futures for concurrency

erwinrooijakkers15:12:32

Is that smart?

erwinrooijakkers15:12:58

Like this:

(defmacro async
  "Evaluates body asynchronously.
  Returns a promise with the result of the evaluation."
  [& forms]
  `(let [p# (promise)]
     (future (deliver p# (do ~@forms)))
     p#))

(time
 (->> [(async (Thread/sleep 2000) (prn "hallo1") 1)
       (async (Thread/sleep 2000) (prn "hallo2") 2)
       (async (Thread/sleep 2000) (prn "hallo3") 3)]
      (mapv deref)))

;; => [1 2 3]
;; "hhaallllo3"o"1"hallo2"
;; "Elapsed time: 2001.748042 msecs"

erwinrooijakkers15:12:03

Where thread/sleep is actually doing a query

erwinrooijakkers15:12:26

Or is it less resource intensive to use core.async?

erwinrooijakkers15:12:37

What are the tradeoffs?

cursork15:12:01

Why not just use future ?

erwinrooijakkers15:12:15

We do need the results

cursork15:12:30

boot.user=> (time
       #_=>  (->> [(future (Thread/sleep 2000) (prn "hallo1") 1)
       #_=>        (future (Thread/sleep 2000) (prn "hallo2") 2)
       #_=>        (future (Thread/sleep 2000) (prn "hallo3") 3)]
       #_=>       (mapv deref)))
"""hallo3hallo1""

hallo2"
"Elapsed time: 2004.803619 msecs"
[1 2 3]

erwinrooijakkers15:12:49

Hmmmm I see 😄

erwinrooijakkers15:12:05

I didn’t know future returned

lukasz15:12:28

I recommend https://github.com/TheClimateCorporation/claypoole for this kind of stuff, removes a lot of boiler plate and has more flexibility in terms of managing threadpools

👍 8

erwinrooijakkers15:12:36

Thanks

borkdude15:12:23

I'm also using https://github.com/mpenet/knit

erwinrooijakkers16:12:00

What is the advantage of that over using plain Clojure future?

erwinrooijakkers16:12:37

Some more features I see, and what are the benefits of being able to specify thread pools?

noisesmith18:12:37

something like pmap that is closer to what people actually think pmap should be (clojure.core pmap is weird), better error handling (if code in a future throws an exception, you don't see it unless / until you deref the future itself)

noisesmith18:12:52

and yeah, the threadpools too but to me that's the least interesting claypoole feature

dabrazhe16:12:34

Hi. How can I efficiently filter [ and ] symbols from a regex string ie in (re-find #"some string [97333] etc" mystring)

andy.fingerhut16:12:53

So you have the regex #"some string [97333]" What do you want as a return value? The regex #"some string 97333" ?

andy.fingerhut16:12:07

str can take a regex and return a regular string, which in this example would be "some string [97333]" .

andy.fingerhut16:12:18

(-> (str #"some string [97333]") (clojure.string/replace "[" "") (clojure.string/replace "]" "")) returns the string "some string 97333" . Not sure if that is what you are looking for.

andy.fingerhut16:12:50

There are ways to use a single call to clojure.string/replace rather than two, if for some reason that would be extremely important to increase the efficiency somewhat, at the cost of code readability.

dabrazhe11:12:24

Thank you for getting back @U0CMVHBL2 I want to filter a sequence of strings if a substring exists. Eg (filter #(re-find substr %) coll But it fails to find anything if the substring contains [ or ] (i guess other regex special chars). I ended up escaping it (-> substr (st/replace ,, "[" "\\[") (st/replace ,, "]" "\\]")) which seems a bit verbose.

dabrazhe11:12:23

Or I can use (string/includes?) : )

andy.fingerhut13:12:16

clojure.string/includes? sounds like it might do exactly what you want. Don't use regex matching if you do not need regex matching -- it will only trip you up.

lukasz16:12:49

@erwinrooijakkers it only depends on your use case, I'd say using futures is ok, as long as you're fine with your whole application sharing the same thread pool for all futures (IIRC). If you need more control, that's when claypoole comes into place and you can manage the thread pool sizes yourself. Applications I work on use both, with pmap/`future` being the default choice

erwinrooijakkers16:12:32

Thank you!

ghadi16:12:23

concurrency is very broad as a question. Concurrency for what?

erwinrooijakkers16:12:35

Querying the database 🙂

ghadi16:12:36

in a web request context?

lukasz16:12:19

"querying a data base" is also very broad 😉 (what sort of DB, do you have a connection pool, are read paths separate from write paths etc etc etc)

erwinrooijakkers16:12:39

postgresql db with a com.mchange.v2.c3p0 ComboPooledDataSource (it makes 10 connections on startup) and read and writes both use HugSQL from same app

erwinrooijakkers16:12:15

in a web request context 🙂

mpenet16:12:28

running a future per handler call on an unbounded threadpool is asking for trouble

mpenet16:12:34

I guess it will depend on usage

erwinrooijakkers16:12:57

Interesting

ghadi16:12:02

most interesting things in Clojure-land aren't about the code, but exploring the problem space performance numbers (request latency, db latency, connection acquisition latency, throughput) need to inform maintenance costs of concurrency

ghadi16:12:27

I realize I'm being super evasive about whether you should use futures or not

erwinrooijakkers16:12:43

Yes I understand

ghadi16:12:48

if the code system doesn't expose those numbers, maybe exposing those is a higher priority than deciding on a code mechanism

erwinrooijakkers16:12:56

We have a load test so we can get raw numbers from outside the system under load. The database is managed so we can probably find some information on latency there. I don’t know about request latency and connection acquisition latency.

erwinrooijakkers16:12:22

> running a future per handler call on an unbounded threadpool is asking for trouble Why exactly is that?

erwinrooijakkers16:12:36

Too many resources used?

kulminaator16:12:10

thread per call can lead to a lot of context switches

kulminaator16:12:24

you will see your cpu power disappear into those

kulminaator16:12:42

and if you allocate a bunch of resources per thread too then those obviously multiply up as well

kulminaator16:12:03

but if you run against a database then your true limitation will arise there, most relational databases start to underperform heavily if you connect hundreds of clients to them in parallel (with actual query/io work being done in parallel)

kulminaator16:12:51

so you may need connection pooling just to keep the db itself alive 🙂

erwinrooijakkers16:12:03

We now have a connection pool already

erwinrooijakkers16:12:08

c3po.ComboPooledDataSource

erwinrooijakkers16:12:38

So I guess that will work correctly

erwinrooijakkers16:12:50

But the futures can cause a problem

kulminaator16:12:17

well if there's a chance that you spawn too many of them, then yes, it's looking for trouble

kulminaator16:12:39

if you're afraid of pulling in too many libraries, just use the Executor from java itself

erwinrooijakkers16:12:48

I don’t mind libraries

erwinrooijakkers16:12:21

If I understand why they are used 🙂

erwinrooijakkers16:12:37

We might spawn too many futures so we can use a thread pool for that I see. I’ll dive into it. core.async I don’t see how we can use it. It does need to work in web request context.

kulminaator16:12:46

if you look under the hood then many of them are just handy wrappers around executor 🙂

erwinrooijakkers16:12:52

Thanks!

kulminaator16:12:01

to make it feel more clojure like 🙂

erwinrooijakkers16:12:38

Yes for example the claypoole one

erwinrooijakkers16:12:59

(require '[com.climate.claypoole :as cp])
;; A threadpool with 2 threads.
(def pool (cp/threadpool 2))
;; Future
(def fut (cp/future pool (myfn myinput)))
;; Ordered pmap
(def intermediates (cp/pmap pool myfn1 myinput-a myinput-b))
;; We can feed the streaming sequence right into another parallel function.
(def output (cp/pmap pool myfn2 intermediates))
;; We can read the output from the stream as it is available.
(doseq [o output] (prn o))
;; NOTE: The JVM doesn't automatically clean up threads for us.
(cp/shutdown pool)

erwinrooijakkers16:12:06

From the docs

erwinrooijakkers16:12:26

Is it reasonable to start a (cp/threadpool 100)

lukasz16:12:28

Can be as "simple" as: (cp/pmap 2 [ (db/query db-pool) (db/query2 db-pool)]) , for pool size of 10, these two queries might be executed in parallel (assuming nothing else is going on)

erwinrooijakkers16:12:47

I see

erwinrooijakkers16:12:13

Thanks a lot

lukasz17:12:36

I might be getting the syntax wrong, but you get the idea - you can create thread pools per request, allocating the size only for the number of queries you need to run. That said, threads have an overhead, and you're working with many variables here (your web servers thread pool, connection pool's thread pool, your own) so it needs a lot of testing and tuning.

kulminaator17:12:41

if you have 100 threads ... but 10 db connections .. what do you suppose will happen ? 🙂

lukasz17:12:48

also this ☝️

erwinrooijakkers17:12:43

😛

kulminaator17:12:54

i don't know of course, maybe you have lots of logic outside db too that takes time. then it may make sense

erwinrooijakkers17:12:56

So 10 threads max for 10 db conns

erwinrooijakkers17:12:17

Is reasonable

erwinrooijakkers17:12:20

But we should test to make sure

erwinrooijakkers17:12:29

Queries are the main bottleneck

kulminaator17:12:36

well then dont go way over that

erwinrooijakkers17:12:38

For what we saw so far

kulminaator17:12:43

otherwise the other threads will just be waiting there

kulminaator17:12:56

and your machine will start to context switch to see if any of the waiting threads can do work now

erwinrooijakkers17:12:22

Clear

erwinrooijakkers17:12:34

So this is for one instance of the app

erwinrooijakkers17:12:55

The database can handle more connections

erwinrooijakkers17:12:57

Than 10

erwinrooijakkers17:12:21

Managed instance and it says up to 50 for the smallest type we are using now

kulminaator17:12:09

you probably want to keep 5-10 connections for maintenance if you need to perform those. but in general don't overthink, if you don't have exact load numbers at your hand yet for the expected load / latencies then pre-emptive optimizations will make your code harder to understand

kulminaator17:12:31

if you are convinced that you are heading for a great load of users then try to keep your design just in a shape that you'd be able to shard out the storage in the future or be able to use read-only replicas for slow heavy scans. but don't implement it all on day one 🙂 just keep your design compatible with either of the principles, depending on your use case 🙂

erwinrooijakkers17:12:46

Perfect thanks

cupello21:12:48

Hi guys! Could anyone help me understand why the load method can not be called inside reload?

(defn cacheload [] (proxy [CacheLoader] [] (load [key]
                                             (println "load" "key" (get key :tenant) "version" (get key :version))
                                             (rand-int 500))
                                           (reload [key, oldValue]
                                              (. executorService (submit (proxy [Callable] [] (call [] (do (try (load key) (catch Exception e (identity oldValue)))))))))))

ghadi22:12:11

@lucelios you're calling clojure.core/load within the reload method

ghadi22:12:22

I think you meant (.load this key)

cupello22:12:21

Thanks! But sadly this doesn't work too...

ghadi22:12:01

post what you tried, and the error you got

ghadi22:12:20

you don't need to proxy Callable

ghadi22:12:30

clojure functions already implement Callable

2019-12-10

Channels