This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-12-10
Channels
- # adventofcode (76)
- # announcements (7)
- # aws (3)
- # babashka (75)
- # beginners (25)
- # calva (37)
- # cider (9)
- # clara (4)
- # clj-kondo (17)
- # cljsrn (1)
- # clojure (106)
- # clojure-europe (4)
- # clojure-india (2)
- # clojure-italy (12)
- # clojure-nl (27)
- # clojure-spec (33)
- # clojure-uk (20)
- # clojurescript (103)
- # clojutre (3)
- # core-async (1)
- # cryogen (10)
- # cursive (24)
- # datomic (113)
- # dirac (5)
- # emacs (12)
- # events (4)
- # fulcro (64)
- # garden (5)
- # jobs (1)
- # kaocha (5)
- # luminus (2)
- # malli (14)
- # off-topic (53)
- # planck (11)
- # re-frame (9)
- # reagent (16)
- # reitit (26)
- # remote-jobs (2)
- # shadow-cljs (137)
- # spacemacs (34)
Hello Channell:wave: How many here use Datomic? What are its pros and cons with respect to other DBs?
Cons: its closed, its not cheap, its awesome.so you think many times to don't care about 1 and 2
@thegobinath I think the pros are pretty well laid out on the Datomic website. Not sure about the cons. When I've talked to Cognitect about suitability for where I work, they raised a couple of flags which I'd say were cons: we have some data sets that are very high write throughput and with Datomic's single transactor and replication architecture that isn't a great fit; we also have some extremely large data sets and Datomic has some upper limits on the amount of data it can work with (I don't know whether those are hard limits or just effective performance limits).
There's a #datomic channel where you can find people who are using Datomic in production who'll probably have some real world feedback on pros and cons.
Cloud does not have a single transactor btw
Depending on your level of seriousness, you can also contact <mailto:[email protected]|[email protected]> for a conversation
I think I encountered this bug while playing with generators. https://clojure.atlassian.net/browse/CLJ-2311 😞
@alexmiller Ah, good to know. Thank you for the correction! I haven't really looked at Datomic much since we established it wasn't a good fit for us -- and that was a while back (before Datomic Cloud even launched).
only one transactor can write to a database, you can't have multiple transactors talking to the same database simultaneously.
We are using promises and futures for concurrency
Is that smart?
Like this:
(defmacro async
"Evaluates body asynchronously.
Returns a promise with the result of the evaluation."
[& forms]
`(let [p# (promise)]
(future (deliver p# (do ~@forms)))
p#))
(time
(->> [(async (Thread/sleep 2000) (prn "hallo1") 1)
(async (Thread/sleep 2000) (prn "hallo2") 2)
(async (Thread/sleep 2000) (prn "hallo3") 3)]
(mapv deref)))
;; => [1 2 3]
;; "hhaallllo3"o"1"hallo2"
;; "Elapsed time: 2001.748042 msecs"
Where thread/sleep is actually doing a query
Or is it less resource intensive to use core.async?
What are the tradeoffs?
We do need the results
boot.user=> (time
#_=> (->> [(future (Thread/sleep 2000) (prn "hallo1") 1)
#_=> (future (Thread/sleep 2000) (prn "hallo2") 2)
#_=> (future (Thread/sleep 2000) (prn "hallo3") 3)]
#_=> (mapv deref)))
"""hallo3hallo1""
hallo2"
"Elapsed time: 2004.803619 msecs"
[1 2 3]
Hmmmm I see 😄
I didn’t know future returned
I recommend https://github.com/TheClimateCorporation/claypoole for this kind of stuff, removes a lot of boiler plate and has more flexibility in terms of managing threadpools
Thanks
I'm also using https://github.com/mpenet/knit
What is the advantage of that over using plain Clojure future
?
Some more features I see, and what are the benefits of being able to specify thread pools?
something like pmap
that is closer to what people actually think pmap should be (clojure.core pmap is weird), better error handling (if code in a future throws an exception, you don't see it unless / until you deref the future itself)
and yeah, the threadpools too but to me that's the least interesting claypoole feature
Hi. How can I efficiently filter [ and ] symbols from a regex string ie in (re-find #"some string [97333] etc" mystring)
So you have the regex #"some string [97333]"
What do you want as a return value? The regex #"some string 97333"
?
str
can take a regex and return a regular string, which in this example would be "some string [97333]"
.
(-> (str #"some string [97333]") (clojure.string/replace "[" "") (clojure.string/replace "]" ""))
returns the string "some string 97333"
. Not sure if that is what you are looking for.
There are ways to use a single call to clojure.string/replace rather than two, if for some reason that would be extremely important to increase the efficiency somewhat, at the cost of code readability.
Thank you for getting back @U0CMVHBL2 I want to filter a sequence of strings if a substring exists. Eg (filter #(re-find substr %) coll
But it fails to find anything if the substring contains [ or ] (i guess other regex special chars). I ended up escaping it
(-> substr (st/replace ,, "[" "\\[") (st/replace ,, "]" "\\]"))
which seems a bit verbose.
clojure.string/includes?
sounds like it might do exactly what you want. Don't use regex matching if you do not need regex matching -- it will only trip you up.
@erwinrooijakkers it only depends on your use case, I'd say using futures is ok, as long as you're fine with your whole application sharing the same thread pool for all futures (IIRC). If you need more control, that's when claypoole comes into place and you can manage the thread pool sizes yourself. Applications I work on use both, with pmap
/`future` being the default choice
Thank you!
Querying the database 🙂
"querying a data base" is also very broad 😉 (what sort of DB, do you have a connection pool, are read paths separate from write paths etc etc etc)
postgresql db with a com.mchange.v2.c3p0 ComboPooledDataSource
(it makes 10 connections on startup) and read and writes both use HugSQL from same app
in a web request context 🙂
Interesting
most interesting things in Clojure-land aren't about the code, but exploring the problem space performance numbers (request latency, db latency, connection acquisition latency, throughput) need to inform maintenance costs of concurrency
Yes I understand
if the code system doesn't expose those numbers, maybe exposing those is a higher priority than deciding on a code mechanism
We have a load test so we can get raw numbers from outside the system under load. The database is managed so we can probably find some information on latency there. I don’t know about request latency and connection acquisition latency.
> running a future per handler call on an unbounded threadpool is asking for trouble Why exactly is that?
Too many resources used?
thread per call can lead to a lot of context switches
you will see your cpu power disappear into those
and if you allocate a bunch of resources per thread too then those obviously multiply up as well
but if you run against a database then your true limitation will arise there, most relational databases start to underperform heavily if you connect hundreds of clients to them in parallel (with actual query/io work being done in parallel)
so you may need connection pooling just to keep the db itself alive 🙂
We now have a connection pool already
c3po.ComboPooledDataSource
So I guess that will work correctly
But the future
s can cause a problem
well if there's a chance that you spawn too many of them, then yes, it's looking for trouble
if you're afraid of pulling in too many libraries, just use the Executor from java itself
I don’t mind libraries
If I understand why they are used 🙂
We might spawn too many futures so we can use a thread pool for that I see. I’ll dive into it. core.async I don’t see how we can use it. It does need to work in web request context.
if you look under the hood then many of them are just handy wrappers around executor 🙂
Thanks!
to make it feel more clojure like 🙂
Yes for example the claypoole one
(require '[com.climate.claypoole :as cp])
;; A threadpool with 2 threads.
(def pool (cp/threadpool 2))
;; Future
(def fut (cp/future pool (myfn myinput)))
;; Ordered pmap
(def intermediates (cp/pmap pool myfn1 myinput-a myinput-b))
;; We can feed the streaming sequence right into another parallel function.
(def output (cp/pmap pool myfn2 intermediates))
;; We can read the output from the stream as it is available.
(doseq [o output] (prn o))
;; NOTE: The JVM doesn't automatically clean up threads for us.
(cp/shutdown pool)
From the docs
Is it reasonable to start a (cp/threadpool 100)
Can be as "simple" as: (cp/pmap 2 [ (db/query db-pool) (db/query2 db-pool)])
, for pool size of 10, these two queries might be executed in parallel (assuming nothing else is going on)
Thanks a lot
I might be getting the syntax wrong, but you get the idea - you can create thread pools per request, allocating the size only for the number of queries you need to run. That said, threads have an overhead, and you're working with many variables here (your web servers thread pool, connection pool's thread pool, your own) so it needs a lot of testing and tuning.
if you have 100 threads ... but 10 db connections .. what do you suppose will happen ? 🙂
i don't know of course, maybe you have lots of logic outside db too that takes time. then it may make sense
So 10 threads max for 10 db conns
Is reasonable
But we should test to make sure
Queries are the main bottleneck
well then dont go way over that
For what we saw so far
otherwise the other threads will just be waiting there
and your machine will start to context switch to see if any of the waiting threads can do work now
So this is for one instance of the app
The database can handle more connections
Than 10
Managed instance and it says up to 50 for the smallest type we are using now
you probably want to keep 5-10 connections for maintenance if you need to perform those. but in general don't overthink, if you don't have exact load numbers at your hand yet for the expected load / latencies then pre-emptive optimizations will make your code harder to understand
if you are convinced that you are heading for a great load of users then try to keep your design just in a shape that you'd be able to shard out the storage in the future or be able to use read-only replicas for slow heavy scans. but don't implement it all on day one 🙂 just keep your design compatible with either of the principles, depending on your use case 🙂
Perfect thanks
Hi guys! Could anyone help me understand why the load
method can not be called inside reload
?
(defn cacheload [] (proxy [CacheLoader] [] (load [key]
(println "load" "key" (get key :tenant) "version" (get key :version))
(rand-int 500))
(reload [key, oldValue]
(. executorService (submit (proxy [Callable] [] (call [] (do (try (load key) (catch Exception e (identity oldValue)))))))))))