Fork me on GitHub
#sql
<
2020-01-10
>
seancorfield06:01:22

Uh-oh... 🙂

dharrigan06:01:38

I'm having an internal debate whether to use PostgreSQL (which is my fave database of all time - well, except the one I wrote back when I were a kid, ran on BBC Basic and provided you were storing records (as in vinyl), with a name, duration and rating - and do not dare to deviate from norms), or use Elasticsearch.....

seancorfield06:01:06

Funnily enough, I wrote my own database, back in the day, for managing my record collection too!

seancorfield06:01:19

But it was UCSD Pascal on a SAGE IV.

seancorfield06:01:07

It turned out very useful when I was burgled: I was able to give the insurance company an itemized list of every CD stolen, date purchased, price paid, shop bought from. So they treated it as a per-item claim -- and opened an account at HMV London for me with ten thousand pounds credit 🙂

😲 24
dharrigan06:01:52

I hope you spent it all before HMV disappeared!

seancorfield06:01:25

Yeah, in two visits. About 8k the first time and 2k the second. They treated me like royalty 🙂

seancorfield06:01:52

I had about 2,000 CDs. The burglars took about half the collection 😞

dharrigan06:01:16

That's a lot of music

dharrigan06:01:23

Do you still have that, or is all streaming now?

seancorfield06:01:29

I left most of the CDs behind in England -- brought only a few hundred over, twenty years ago -- but I've re-purchased a lot of it on iTunes since. I have over 60GB of music now I think...

seancorfield06:01:39

iTunes says I have 960 albums by 455 artists. 8,752 tracks.

dharrigan06:01:50

That's a lot of music

dharrigan06:01:03

The use cases are the same, storing data, in both cases JSON (with additional fields)

seancorfield06:01:08

Chalk and cheese.

dharrigan06:01:33

yeah, I get they both have good purposes, but here is what I'm thinking....

dharrigan06:01:50

I'm very used to postgres, the libraries (thanks sean!) are great and blend lovely with Clojure. The thing that the user only wants to do is lookup data using an id - there isn't yet and I say yet any need for full text searching.

dharrigan06:01:19

I presently use Elasticsearch, but now I have a requirement from the users to add notes to the data

dharrigan06:01:52

and for that, having a little table with a simple straight-forward join to the parent is suuupar simple with PG (or any db)

seancorfield06:01:05

We use both MySQL and Elastic Search at work. MySQL for structured, relational data. ES for less structured, searchable with complex, scored queries.

seancorfield06:01:28

We don't do much text searching but we do geo searches, age range searches, etc.

dharrigan06:01:01

I presently use the Postgis extension in PG to do geospatial stuff (works great)

dharrigan06:01:47

The ability for pg to index on json data (and store it efficiently as jsonb) - also to search inside json has come a long way

dharrigan06:01:08

I was happy using ES, but with the requirement to add notes, it requires mutation

dharrigan06:01:27

and I choose ES originally as the json stored in ES would be (currently is!) immutable

dharrigan06:01:23

I think I need to ask some more questions of the business

seancorfield06:01:02

We post updated profile data to ES regularly -- every time a member changes their profile. We don't view ES as immutable.

dharrigan06:01:23

My usecase is something around fraud

dharrigan06:01:00

If they create data around fraud, they want that data not be changeable, since it could be used as part of an official investigation, so they need to know the state of the world at that precise point in time.

yogidevbear07:01:06

I know I may sound biased here, but it might be worth looking at Crux

yogidevbear07:01:13

Reason being your need for point in time queries. The bitemporality feature of Crux is probably perfectly suited for what you need here

dharrigan08:01:18

Thanks for the tip! 🙂

seancorfield06:01:16

Datomic! 🙂

dharrigan06:01:31

Hmmm, indeed......I will do some research

dharrigan06:01:49

We are open to new tech, but it really has to be justifiable, and cost effective.

dharrigan06:01:19

Our CTO doesn't just want to throw in any new shiney toy (and as you know, devs love their toys)

seancorfield06:01:22

If you want an accounting trail for fraud etc, Datomic is a perfect match. If you want fancy searching, ES is great. It's why we use both MySQL and ES. We "publish" member profiles into ES every time one changes, so we can do member searches of profiles (typical dating stuff). We also keep an audit trail of all changes to key parts of a member profile.

seancorfield06:01:10

We talked to Cognitect about our use case and some parts were a perfect fit for Datomic and other parts were totally not appropriate. We didn't want to run both Datomic and MySQL so we chose to just run the latter. But we would have needed a dedicated search service on top of both of those anyway -- and we were using Transparensee's Discovery engine for years, before we switched to ES.

Jakub HolĂ˝ (HolyJak)09:01:10

Out of curiosity (and as a fact to include in future decisions), what was it that was OK in MySQL but totally inappropriate in Datomic? Has it to do with the lack of excision in D. Cloud or st. else?

seancorfield20:01:55

@U0522TWDA We have some datasets that are extremely large -- beyond Datomic's reasonable size range at the time -- and we have some very high write datasets, which doesn't match Datomic's model. Datomic would be a perfect match for some of our datasets, but not everything.

👍 4
dharrigan06:01:13

I see totally the need to do fancy searches, and ES is fantastic at that.

dharrigan06:01:28

I think I need to go back to the business and determine if they invisage that

dharrigan06:01:37

I think they don't

dharrigan06:01:12

they only thing they want to know is finding something by an id (of which the stuff was created at), and perhaps seraching inside notes to pull back the bigger picture.

seancorfield06:01:16

We use Elastic Cloud for our ES setup. About $900/month for what we're doing, but it can be a lot cheaper if your data volume and search traffic is lower.

seancorfield06:01:33

(or a lot more expensive if your volumes/traffic are higher!)

dharrigan06:01:10

I'm going to have a ponder - in the end, it's really a business decision, so I have to present to them some options

dharrigan06:01:12

and see what they think

seancorfield06:01:50

My gut, based on what you've described, is that Datomic Cloud + Elastic Cloud might be the sweet spot...

dharrigan06:01:13

yeah, indeed. I'm looking over the datatomic stuff atm.

Jakub HolĂ˝ (HolyJak)09:01:20

FYI I heard from two devs that are very fond of Clojure that if they had to choose between the language and Datomic, they would pick Datomic. It is, presumabely, such a game changer.

Jakub HolĂ˝ (HolyJak)09:01:20

Also, I enjoyed a lot https://augustl.com/blog/2018/datomic_look_at_all_the_things_i_am_not_doing/ and https://augustl.com/blog/2018/datomic_look_at_all_the_things_i_am_not_doing_cont/ For persuading business about Datomic, have a look at https://engineering.telia.no/blog/talk-exploring-four-hidden-superpowers-of-datomic and the resources it links to. The only problem with Datomic Cloud (contrary to on-premise) I have experienced is missing excision, i.e. the ability to delete old data. That is bad for GDPR but perhaps good for your use case.

tavistock14:01:39

in jdbc.next with psql there an easy way to convert a pgObject returned by agg_array into a hashmap, my feeling is no because .getType doesn’t return the schema it just returns the table name

seancorfield16:01:36

@tavistock Have you read https://cljdoc.org/d/seancorfield/next.jdbc/1.0.13/doc/getting-started#working-with-additional-data-types and the sections it links to? It sounds like you want to extend ReadableParameter to that type perhaps?

tavistock16:01:13

yea, i am using ReadableColumn to implement the pgObject conversion, whenever you use json_agg or the like it returns a pgObject with a type of "json" and whenever you use array_agg it returns a pgObject with a type of table name as a string. I was wondering if there was a way that other people were using to covert the latter pgObject into a hashmap but i can’t think of a way that it would be possible to do that.

tavistock16:01:42

it may not even be possible with the interface that pgObject provides.

tavistock16:01:12

to give an example in psql:

select COALESCE(array_agg(b) FILTER (WHERE t.id IS NOT NULL) as bs, '{}'), a.id
from a
left join b on b.id = a.b_id
group by a.id;

tavistock16:01:18

the bs would be a list of pgObjects with a getType of “b” and getValue of “(1,\“example\“)”

tavistock16:01:49

and i want to go from pgObject -> {:id 1 :field "example"}

seancorfield17:01:30

Hmm, PostgreSQL is a tricky beast with all its custom types. Nearly all of my maintenance headaches on clojure.java.jdbc and next.jdbc come from PG users 🙂

seancorfield17:01:17

There are a few PG-specific libs out there that handle extensions to clojure.java.jdbc I think -- maybe one of those will give some insight into how to deal with this?

seancorfield17:01:31

(I don't use PG myself, precisely because of all this sort of pain!)

tavistock18:01:47

thanks, ill look and see if the approaches other people have taken for pg and clojure.java.jdbc will help me out, thanks for you work on jdbc.next everything so far has been perfect and well documented, so much so that the only problems i ever run into is pgObject stuff being weird

dcj21:01:45

@tavistock I'm not exactly following your questions, but here is how I coerce between clj/edn and postgres....

👍 4
valtteri14:01:34

☝️ good stuff!