Fork me on GitHub
#sql
<
2019-04-11
>
ben17:04:15

Are there any SQL ORMs for Clojure? The frameworks I’ve seen for Clojure (Luminus) encourage you to manage your database more or less with SQL (e.g. HugSQL). This is quite a different approach to what I’ve seen in things like Django or Rails. Is it just an OOP vs FP thing? Is there a broader philosophical reason that is specific to Clojure? Are there good ORMs that I simply haven’t come across?

gklijs17:04:17

I don't think there is any 'real' ORM for Clojure. Probably in large part because Clojure is dynamic, and thus it's fine to work just with maps. I myself find it quite liberating to be able to work with something like just clojure.java.jdbc instead of having to define types for everything.

noisesmith17:04:35

Clojure, as a community, is pretty opinionated against ORMs (my first job using clojure, where I learned Clojure, was working in a Clojure ORM - like thing, it wasn't particularly popular)

noisesmith17:04:58

when in doubt we prefer plain data over customized Objects, so much of our code is already turning some specialized object type from some Java lib into plain data

noisesmith17:04:41

with clojure.java.jdbc, we already have a view as data going both ways, and that fits idiomatic clojure dev perfectly

noisesmith17:04:18

(by plain data I mean hash-maps, lists, vectors, sets, numbers, strings, etc.)

gklijs17:04:34

I could imagine if spec becomes final a need might arise to work more easily with specced data with a relational database.

noisesmith17:04:17

@ben606 back to your original question, I think it is an OOP vs. FP thing - we tend to design in terms of maximally abstract data and data transformations, rather than specialized models

noisesmith17:04:22

even with spec it's about asserting some properties of plain data, not coercing into an opaque model

noisesmith17:04:53

clojure.java.jdbc or hugsql doesn't need to change its behavior to be spec compatible

gklijs17:04:00

No, but I could imagine something that takes a collections of spec's and outputs the SQL to update the table to store those.

noisesmith17:04:28

oh, like ddl via spec?

gklijs17:04:25

I guess, not sure how feasible it is through, with maps-in-maps and vectors-in-maps and such. Will probably be pretty hard to support all that's possible, so maybe just a subset.

hiredman17:04:32

I've gone the other way, generating specs from a database (using the column metadata you get from jdbc)

hiredman17:04:35

but I could imagine having to have both in lock step all the time to very painful

noisesmith17:04:57

also anything about sql schemas and ddl is either very weak or very non-portable

noisesmith17:04:23

eg. I could imagine a postgres<->spec lib being much more useful than a jdbc<->spec lib

ben17:04:53

This is a really interesting discussion, thanks. I guess I’m still wondering why something that covers some of the duties of an ORM, while still being idiomatic hasn’t arisen. For example, it seems to me like being able to define your schema in clojure (I guess integration with spec here would be nice) and generate migrations automatically would both be nice, useful features. Am I still missing a trick?

hiredman17:04:45

I dunno, I find sql to be really good at being sql, where most dsls that wrap it do it poorly

ben17:04:06

That’s true; all abstractions leak eventually

ben17:04:14

but that doesn’t mean they’re pointless

hiredman17:04:27

there dsl kinds of things for building queries and what not

ben17:04:50

> I think it is an OOP vs. FP thing I’m not entirely sure about what the ML langs do, but it could be an interesting point of comparison

hiredman17:04:02

I find sql abstractions leak immediately

hiredman17:04:30

clojure's model is so close to being relational having a mapping also doesn't feel like it buys you much

noisesmith17:04:43

@ben606 I think the main reason that doesn't exist is that it's simpler to deal with the sql schema as what it is, and turn that into something idiomatic for clojure (lazy-seq for streaming results, vectors for eager, sets for rows), and creating something on top of that to impose a reified data model has a much lower benefit in a data-first language like clojure

👍 4
hiredman17:04:44

like, the most common data in clojure is a seq of maps

hiredman17:04:53

which is basically a table of rows

ben17:04:31

I’ve seen people generate Haskell/Elm types from a database before which implies they do something similar to clojure, though

hiredman17:04:52

that is the static/dynamic difference though

hiredman17:04:13

they must generate types to be able to effectively work with the data in their language

noisesmith17:04:21

yeah - because fp isn't about making a black box for side effects and state, and a lot of what makes an ORM interesting is mapping from static boring data to containers for behaviors and mutation

ben17:04:26

Right but it’s still SQL-as-SQL, and the DB is the source of truth

ben17:04:39

As opposed to generating your DB schema etc from your types

ben17:04:42

If that makes sense

ben17:04:10

I think the data-first (as opposed to object-first?) helps me make sense of this

ben17:04:54

It’s not as though automatic migration generation wouldn’t be useful, it’s just that you’re kind of approaching the problem from a whole different perspective

noisesmith17:04:04

right - wanting auto migration generation makes pragmatic sense when you want to pretend a db holds a complex object with black box behaviors and internal state

ghadi17:04:20

@seancorfield yes, still think nav should be opt in

seancorfield18:04:42

@ghadi I did some benchmarks on the difference between just building a hash map vs. also adding metadata and it was about 10ns difference on an operation that takes about 3 micro seconds.

seancorfield18:04:24

After spending a fair bit of time working a variant of RowBuilder into the code last night, my next step will be breaking apart the low-level execute/reduce parts so I can address the three use case drivers separately and as optimally as possible...

ghadi18:04:43

well then!

ghadi18:04:16

I take it the nav support isn't eagerly reading ResultSetMetadata and trying to be clever

seancorfield18:04:24

1. reducible! will not realize rows (unless you apply a function that requires a hash map, like assoc, in which case it will)

seancorfield18:04:01

2. execute-one! will realize just one row from the ResultSet as a hash map (and will not do the reduce thing at all)

seancorfield18:04:14

3. execute! will default to realizes an entire ResultSet -- and do it via a ResultSetBuilder protocol that has start/add-row/end hooks, much like RowBuilder has start/add-column/end hooks.

seancorfield18:04:49

nav support is lazy -- the only overhead of being datafiable is the with-meta call to add the datafy protocol with a function value.

seancorfield18:04:43

Only if you actually call datafy will it then attach metadata for the nav protocol, and only if you then call nav will it actually do anything.

seancorfield18:04:52

The caveat there is that datafy/`nav` has to happen in the same "connectable" context as the original query -- so if you do it from a Connection, that has to stay open for datafy/`nav` to work (but if you do it from a DataSource, the nav operation will get a new connection from that as needed.

ghadi18:04:28

makes sense

seancorfield18:04:35

My intent is that you only pay for what you imply you're going to need -- so case #1 will delay metadata fetch / column name creation until you actually force a hash map operation; case #2 will take that hit up front, as will case #3. #1 will use a delayed hash map row builder, #2 will use an eager hash map row builder, and #3 will use an eager result set builder.

seancorfield18:04:02

The result set builder will have variants for vector of hash maps and vector of cols-plus-row-values (`as-arrays`) and I'll add an example in the benchmark tests for vector of records (since @ikitommi seems very interested in that approach).

ghadi18:04:12

Sean asked me for feedback so I'm providing it.... as-arrays is SUPER WEIRD

ghadi18:04:56

I think something should be able to provide [cols rows*], but something in user space, not in the library

seancorfield18:04:00

Hahaha... I know, I know... I have never used it in production code (just grep'd for it!) but quite a few people do based on the feedback around it.

ghadi18:04:01

I agree with benchmarks being primarily concerned with consuming a complete resultset

ghadi18:04:32

as in, marshalling all datums into a vector of maps

ghadi18:04:09

@seancorfield as long as the API doesn't contort itself to support that weird case

seancorfield18:04:36

It's just a result set materialization strategy 🙂

ghadi18:04:13

I am liking the idea of providing as an argument a function that returns a RowBuilder, rather than providing the RowBuilder directly

ghadi18:04:29

takes RSMeta, returns RowBuilder

ghadi18:04:19

or takes (RSMeta + Map of Arbitrary User Data)

ghadi18:04:15

i have no idea what that is

ikitommi18:04:49

just that :)

ikitommi19:04:40

function that returns a reified RowCompiler, that, given an seq of fields will return a function of result-set => value

ikitommi19:04:46

In porsas, The columns can be read at query compile time and the cost to realize a row into a map is close to zero. For both maps & records.

ghadi19:04:18

can you show an example of such an invocation @ikitommi?

seancorfield19:04:52

Current sketch of RowBuilder in next.jdbc: https://github.com/seancorfield/next-jdbc/blob/master/src/next/jdbc/result_set.clj#L56-L65 (which example of hash map row builder below). Subject to change (lots of change) but that's made several of the benchmark tests noticeably faster -- I haven't yet updated the recorded timings in the test file tho').

ghadi19:04:46

Why does it need the column count?

seancorfield19:04:40

It uses .getObject with a column index. So it needs to know how many columns to iterate over.

ghadi19:04:13

because someone else is driving the iteration

ghadi19:04:44

I was thinking RowBuilder would be given the already extracted value

ghadi19:04:30

so most rowbuilders would look like:

(fn ([] {})
  ([m] m)
  ([m k v] (assoc m k v)))

seancorfield19:04:33

This way avoids the user of RowBuilder needing to know anything about the names of columns -- that's all encapsulated in the builder.

ghadi19:04:47

that can be handed in, too

ghadi19:04:12

it is very interesting how bad the ResultSet API is

ghadi19:04:26

we have different things we're conflating: 1) how do you get the names for columns in a resultset? 2) how do you extract a row from the RS? 3) how do you build something based on what is extracted?

seancorfield19:04:42

Well, yes, ResultSet is horrible. We can all agree on that 🙂

seancorfield19:04:36

Users of clojure.java.jdbc / next.jdbc in general want control of column name creation strategies. The RowBuilder can get the names from the result set's metadata if it wants them (which is why in next.jdbc right now it is passed the ResultSet when the strategy is created -- so it can do whatever it wants/needs).

seancorfield19:04:54

So that's the answer to 1.

ghadi19:04:55

you should also pass query opts

seancorfield19:04:02

I do. Read the code 🙂

ghadi19:04:05

oh, link?

seancorfield19:04:18

It's right below the code I linked to above...

ghadi19:04:23

cool, I can shove ns'ed stuff in there

ghadi19:04:52

{:ghadi/pre-fabbed-names [:foo :bar :baz]}

ghadi19:04:10

and the rowbuilding function has access to them

seancorfield19:04:13

Yes, the intent is to allow users to be able to override the column naming just by writing their own RowBuilder.

ghadi19:04:27

you wrote a rowbuilderbuilder!

seancorfield19:04:41

I am in pain 😐

seancorfield19:04:04

So, #2 is done by reducing on the row builder: call row to get a fresh row, repeatedly call with-column to add column values, call build to complete the row (yes, horrible names and they'll change when I write ResultSetBuilder).

seancorfield19:04:11

Not sure what you're asking with #3?

ghadi19:04:50

#3 is a reducing function returning a map, essentially

ghadi19:04:02

#2 is (.getObject rs i)

seancorfield19:04:04

The execute! (multi-row) function will use the ResultSetBuilder in a similar way, with a call to get a fresh result set (generally a transient vector), a call to add each row (built per the above), and a call to complete the result set (make it persistent, most likely). That way as-arrays can get column names in the "fresh result set" call and return a vector containing a vector of column names, and the row builder portion will just return vectors of column values -- no names, and those will be added to the result set...

ikitommi19:04:55

@ghadi here’s how to use that:

;; 730ns
(let [query (p/compile
              "SELECT * FROM fruit"
              {:connection connection
               :row (p/rs->map)
               :key (p/unqualified-key str/lower-case)})]
  (title "porsas: compiled map, unqualified")
  (bench! (query connection)))

ikitommi19:04:00

1) extract the resultset metadata at a query-compile time. it build a RowCompiler which is called for all rows. Perf is 10x better than realizing rows with idiomatic clojure code.

ghadi19:04:14

what is query-compile time?

ghadi19:04:29

is porsas parsing SQL and reading information_schema?

ikitommi19:04:47

it’s the p/compile, you basically prerun the query once to get the metadata to produce an optimal mapper

ikitommi19:04:08

same that @seancorfield is doing at request-time (first thing in all requests I believe)

ghadi19:04:20

prerun how?

ikitommi19:04:34

with the given sql statement.

ikitommi19:04:38

and the connection

ghadi19:04:53

how is it 730ns but going over the network?

ikitommi19:04:19

that’s the h2 in-memory db, same suite as the next.jdbc and java.jdbc

ikitommi19:04:41

;; 9200ns
(let [query #(j/query {:connection %} ["SELECT * FROM fruit"])]
  (title "java.jdbc")
  (bench! (query connection))

ikitommi19:04:52

that’s the java.jdbc for same

ikitommi19:04:23

the results are from both:

[{:id 1, :name "Apple", :appearance "red", :cost 59, :grade 87.0}
 {:id 2, :name "Banana", :appearance "yellow", :cost 29, :grade 92.2}
 {:id 3, :name "Peach", :appearance "fuzzy", :cost 139, :grade 90.0}
 {:id 4, :name "Orange", :appearance "juicy", :cost 89, :grade 88.6}]

ikitommi19:04:59

this is how new Scala/Java libs do that, generate code for the mapper.

ghadi19:04:05

@seancorfield @ikitommi I think optimizing for the case of not going over the network (most SQL dbs are remote), or not consuming a query fully, is folly

ghadi19:04:29

if some is SELECT *, it's going to transfer everything, and that will dominate

ikitommi19:04:56

depends what one is looking for 🙂

seancorfield20:04:10

I think you'd be very surprised at the overhead of turning a ResultSet into a vector of hash maps for large result sets...

ghadi20:04:12

map construction in clojure will be a minor fraction

ikitommi20:04:17

I’m looking for “how fast we can go with Clojure”, kinda pushing the limits for no good reason.

ghadi20:04:53

@seancorfield if you're using transients and conj!, and not doing a lot of repeated work in the middle of the loop, it will be good enough

ghadi20:04:11

read the names of the columns once at the outset, etc.

ikitommi20:04:14

transient are slower on small maps

ghadi20:04:26

again.... network dominates!

ikitommi20:04:35

here’s a good example of clojure jdbc perf compared to other langs, in a silly web benchmark: https://www.techempower.com/benchmarks/#section=data-r17&amp;hw=ph&amp;test=fortune

seancorfield20:04:44

clojure.java.jdbc is "slow" at result set building -- I've benchmarked it against (remote) Percona/MySQL instances to compare against next.jdbc -- the improvements around faster result set building are definitely worthwhile.

ghadi20:04:46

but current clojure.java.jdbc is a low bar (in terms of performance)

ghadi20:04:15

and i'm not saying we can't do better

ikitommi20:04:20

for some reason, the fastest clojure version is 4x slower than the fastest ones. the code is kinda same, just the rowmapping is doing a lot more work.

ikitommi20:04:42

(btw, adding porsas entry to that, to see if that shows).

seancorfield20:04:35

When the speed of row mapping was first brought up, I was very skeptical that it made enough of a difference compared to network/JDBC time spent that it would be worth addressing... but, yeah, it really does.

ghadi20:04:04

probably compared to current java.jdbc sure

ikitommi20:04:19

that’s the latency figures in porsas, where I tested different approaches.

ikitommi20:04:45

mapping to maps (qualified or not) is about as fast as records, which can be made as fast as java.

ikitommi20:04:19

happy to contribute some of the perf code to whatever will be the next jdbc wrapper.

ikitommi20:04:21

Next thing planned is to hide the compilation phase into the query itself, e.g. create the mapper on fist query and store it for next queries. The first call would be 3x slower, rest fast.

ikitommi20:04:48

#reitit also has similar optimizations, as there would be a cache, it should be bounded and exposed for monitoring.

ikitommi20:04:14

(off by default in reitit)