Fork me on GitHub
#beginners
<
2020-06-14
>
sova14:06:37

Hi, I want to turn the entries of a set into maps. It's the result of a datomic-esque query using datahike..

; ;; successfully returns all chats from room
(println  "haxorrr : "
  (d/q '[:find  ?authorid ?content ?timestamp ?messageid
       :in $ ?kind ?roomname
       :where
       [?m :message/kind ?kind]
       [?m :message/roomname ?roomname]
       [?m :message/authorid ?authorid]
       [?m :message/content ?content]
       [?m :message/timestamp ?timestamp]
       [?m :message/messageid ?messageid]]
  @conn "chat" "Beginners"))
returns
haxorrr :  #{[a.tester Welcome to the beginner's channel. Please post any questions you have here.  Feel free to discuss! 1592143964 aed7504f-b1a7] [c.tester Quite a hypothetical, indeed. 1592143964 cf91e354-abab] [b.tester That is a hypothetical. 1592143964 7c96b2a2-ef2b] [a.tester Say I had a question, you'd want to help me figure it out. 1592143964 7c22e8fd-6391] [a.tester Real mature, gems. 1592143964 ad536ace-5a3f]}
So I'd like a vector of maps instead, that way I can sort it by timestamp and generate a page from it. Is there a better way to sort by timestamp ? open to suggestions.

dpsutton15:06:45

does datahike support the pull syntax?

sova15:06:11

Neat. I have not used pull before... I recall it being a formative part of om.next but I still don't really get it

sova16:06:16

That's a nice page but I need more examples to make use of it How can I use pull syntax to get database results?

(println "pull? " (d/pull @conn '[*] :message/messageid))
gives me the schema def / ident...
pull?  #:db{:id 1, :cardinality :db.cardinality/one, :ident :message/messageid, :unique :db.unique/value, :valueType :db.type/string}
but how would I change that pull line to get actual db results? It says the entity identifier (3rd and last arg) must have uniqueness... which is groovy, but then how do I get a meaningful result?

sova16:06:16

That's a nice page but I need more examples to make use of it How can I use pull syntax to get database results?

(println "pull? " (d/pull @conn '[*] :message/messageid))
gives me the schema def / ident...
pull?  #:db{:id 1, :cardinality :db.cardinality/one, :ident :message/messageid, :unique :db.unique/value, :valueType :db.type/string}
but how would I change that pull line to get actual db results? It says the entity identifier (3rd and last arg) must have uniqueness... which is groovy, but then how do I get a meaningful result?

pkoivisto16:06:02

You're pulling everything (the '[*] pattern) for the entity corresponding to the :message/messageid ident; instead you'd likely want to pass in the third position the entity id (or lookup ref) corresponding to whatever entity it is you're interested in.

pkoivisto16:06:16

Or, more likely, you'll want to have a look at the section Pull and Query combined of the document linked above. Then the pull pattern specified will be applied for all the entities returned by the query.

sova18:06:05

woots. makin' some progress aw yeah B-) thanks y'all

lukas.block19:06:11

👋 Hi, I'm trying to spec a tuple where I only care about the first value. The second can be every arbitrary type how can I achieve that?

lukas.block19:06:36

Okay nvm I guess any? will work for me 😇

skinner8920:06:42

I'm looking for suggestions for ways to speed this up. It inserts 100000 simpe records into a postgresql table. Right now the best time I've observed is 25 seconds. It uses HikariCP for connection pooling.

(def datasource-options {:auto-commit        true
                         :read-only          false
                         :connection-timeout 30000
                         :validation-timeout 5000
                         :idle-timeout       600000
                         :max-lifetime       1800000
                         :minimum-idle       10
                         :maximum-pool-size  10
                         :pool-name          "db-pool"
                         :adapter            "postgresql"
                         :username           "*******"
                         :password           "*******"
                         :database-name      "*******"
                         :server-name        "localhost"
                         :port-number        5432
                         :register-mbeans    false})

(defonce datasource
         (delay (make-datasource datasource-options)))

(defn do-insert [n]
  (jdbc/with-db-connection
      [conn {:datasource @datasource}]
      (jdbc/execute! conn 
                     ["insert into speedy values ( ? , ? )" n "some textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome text"])
      ))

(defn run [& args]
  (count (pmap do-insert (range 100000))))

skinner8920:06:42

I'm looking for suggestions for ways to speed this up. It inserts 100000 simpe records into a postgresql table. Right now the best time I've observed is 25 seconds. It uses HikariCP for connection pooling.

(def datasource-options {:auto-commit        true
                         :read-only          false
                         :connection-timeout 30000
                         :validation-timeout 5000
                         :idle-timeout       600000
                         :max-lifetime       1800000
                         :minimum-idle       10
                         :maximum-pool-size  10
                         :pool-name          "db-pool"
                         :adapter            "postgresql"
                         :username           "*******"
                         :password           "*******"
                         :database-name      "*******"
                         :server-name        "localhost"
                         :port-number        5432
                         :register-mbeans    false})

(defonce datasource
         (delay (make-datasource datasource-options)))

(defn do-insert [n]
  (jdbc/with-db-connection
      [conn {:datasource @datasource}]
      (jdbc/execute! conn 
                     ["insert into speedy values ( ? , ? )" n "some textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome textsome text"])
      ))

(defn run [& args]
  (count (pmap do-insert (range 100000))))

smith.adriane20:06:01

like disabling auto commit

smith.adriane20:06:58

as well as using COPY in favor of INSERT . Although that depends on where your database is and where your data is

skinner8920:06:18

I'll give this a try post the results

skinner8920:06:10

I won't be able to use copy in the solution I'm designing. Some data validation has to happen in code for the real thing.

smith.adriane21:06:28

if you can upload files to the server postgres is running on, you can potentially validate the data, write it to the file, and then run the COPY command. not sure if that would be faster, but potentially

smith.adriane21:06:34

the INSERT https://www.postgresql.org/docs/9.5/sql-insert.html only returns oid count . I suspect that the time spent on processResults is just time spent waiting for a response and not time actually processing results.

skinner8921:06:16

ok yeah that makes sense

dharrigan21:06:14

Do you also have indicies on the table? That will slow down responses too.

smith.adriane21:06:57

so besides the loading command, the other area where you might achieve a large speedup is network speed. eg. if your database is on aws and your data is your local dev machine, then a large fraction of the time might just be sending data to the database

dharrigan21:06:31

Is the table unlogged?

dharrigan21:06:45

that will make things a lot faster too, but with the downside of possible data loss 😉

dharrigan21:06:01

(if the db crashes...)

dharrigan21:06:33

you can temporarily make the table unlogged, then when done, make it logged again

skinner8921:06:01

@ I set :auto-commit false but no speed up

skinner8921:06:45

but I'm thinking I need to batch differently, like do-insert might need to run a sub-batch of items

smith.adriane21:06:11

what’s the network like between where the query is run from and where the server is?

skinner8921:06:20

this is all local right now

smith.adriane21:06:21

are they both on the same network?

skinner8921:06:39

production will all be in the same aws vpc

skinner8921:06:52

so not too worried about latency

seancorfield21:06:55

Since this is PostgreSQL, you might also need :reWriteBatchedInserts true (per my note in the main channel).

dharrigan21:06:44

Yes, as @ mentions, that is worth trying. I have that setting in one of my projects too that does a large amount of inserts.

skinner8921:06:53

Thanks @ I'm reading through the rest of this doc to understand the batch mechanism described and am going to work up an implemenation using it.

skinner8921:06:17

@ is there a way to tell clojure.java.jdbc or next not to do anything with the resultset?

seancorfield21:06:05

next.jdbc, yes. clojure.java.jdbc, no.

skinner8921:06:01

gotcha. I have no reason not to use next 🙂 so I guess I'll switch to that

seancorfield21:06:30

It's all a lot easier in next.jdbc because it deliberately traffics in JDBC objects and exposes a lot more of that to Java interop if you need it. clojure.java.jdbc tries hard to hide all of that -- so you can't easily get at the underlying stuff. In addition, c.j.j auto-wraps everything in a transaction unless you explicitly opt-out -- whereas next.jdbc leaves all transactions up to you.

seancorfield21:06:55

c.j.j is no longer being actively maintained (by me) -- all my focus is on next.jdbc now.

skinner8921:06:54

Right on. I've read that in the first sentence of the c.j.j. readme many times now but I'm usually following tutorials/guides that use c.j.j. so opt not to add another unknown to the mix of what I'm trying to do at the time 🙂

skinner8922:06:11

Sweet. Got it down to

(time (run))
"Elapsed time: 1088.791338 msecs"

skinner8922:06:42

with 1000 batches of 100: `

skinner8922:06:01

(defn run-batch [i]
  (with-open
    [conn (nxt/get-connection @datasource)
     ps (nxt/prepare conn ["insert into speedy values (?,'hello')"])]
    (next.jdbc.prepare/execute-batch! ps (into [] (map (fn [n] [n]) (range 100))))))

(defn run [& args]
  (count (pmap run-batch (range 1000))))

skinner8922:06:51

Thanks for y'all's help!

skinner8913:06:45

@ this fastest setup I observed was 400,000 inserts in 3554 msecs using:

(defn run-batch [i]
  (with-open
    [conn (nxt/get-connection @datasource)
     ps (nxt/prepare conn ["insert into speedy values (?,?)"])]
    (next.jdbc.prepare/execute-batch!
      ps
      (into [] (map (fn [n] [n raw-data]) (range 100000)))
      {:batch-size 10000})))

(defn run [& args]
  (count (pmap run-batch (range 4))))

skinner8913:06:30

I just moved the (into.. computation to a var outside the function and observed "Elapsed time: 2471.442009 msecs"

seancorfield15:06:16

Did you also add the rewrite option I pointed you at? And that sounds like pretty good performance: 2.5s for 400k rows.

skinner8920:06:25

it looks like it's spending a good deal of time in processResults which I don't really care about

seancorfield21:06:27

@skinner89 I know you're using clojure.java.jdbc there, rather than next.jdbc but read this section of the latter's docs that talks about a possible issue there https://cljdoc.org/d/seancorfield/next.jdbc/1.0.462/doc/getting-started/prepared-statements#caveats

radicalmatt21:06:45

I find myself wanting to change the value of something I've defined in an outer let binding form when I encounter some condition inside of an inner letfn . Realizing that what I introduce in a let binding form cannot be changed, is there something analagous to let that would actually let me change the value of a binding?

seancorfield21:06:55

@radicalmatt If you want something mutable, there's atom, but you should probably try to rethink your approach. You almost never need mutation for most stuff.

seancorfield21:06:55

@radicalmatt If you want something mutable, there's atom, but you should probably try to rethink your approach. You almost never need mutation for most stuff.

radicalmatt22:06:09

I'm working on a puzzle involving tree traversal, and keeping track of certain nodes I've visited while recursing through the structure.

radicalmatt22:06:57

I'm sure there's a way to encode a vector of what I've visited into a recursive function parameter, but I'm just not finding it

raspasov22:06:43

If you can post the problem it might be easier to give suggestions 🙂

seancorfield22:06:32

Sounds a bit like zippers... Have you looked at clojure.zip?

raspasov22:06:48

Also, we might be getting out of #beginners here, but I’ve used https://github.com/redplanetlabs/specter to do pretty much what you’re describing

raspasov22:06:01

Look for the place on the page where it says: “When doing more involved transformations, you often find you lose context when navigating deep within a data structure and need information “up” the data structure to perform the transformation. Specter solves this problem by allowing you to collect values during navigation to use in the transform function.”

radicalmatt22:06:15

not yet I'll have a peek sean .

raspasov22:06:16

Specter is its own beast almost, but allows pretty powerful stuff as far as immutable data transformation is concerned

radicalmatt22:06:54

it's possible I might need to use like, an explicit stack to represent unvisited nodes instead of the recur(left) recur(right) type approach..

radicalmatt22:06:28

I'm simplifying the puzzle to just returning a list of node names that constitute a simple preorder tree traversal haha

andy.fingerhut21:06:24

You can 'shadow' the value of an earlier binding, even within the same let , but also within an inner let . That isn't mutating an existing value, just giving the same name to a new value.

seancorfield21:06:39

(let [a 1
      a (+ a 2)]
  a) ; 3

raspasov22:06:09

@radicalmatt any time you find yourself wanting to change something try thinking “well I can just make a NEW thing” 🙂

(let [a 1
      a' (+ a 2)]
  a') ; 3
I personally like the “a prime” approach where I don’t shadow bindings; instead I give them similar but different names