Fork me on GitHub
#off-topic
<
2022-04-13
>
Takis_00:04:10

I read recent discussion why Clojure people dont use MongoDB alot, i don't know really i am curious to know also, because to me they seem natural fit, Clojure is tree language, JSON data are tree data, MongoDB query language is also tree language in JSON, if you replace {} with () on function calls, and give clojure syntax and names mongodb query language can look almost as clojure for example cMQL example https://cmql.org/playmongo/?q=622b3f209096ce67bc69675c feels almost like query mongodb with clojure and we reduce query size also like ~3x, because clojure is way more compact and simple, i might be wrong but to me json and clojure are very related

Takis_00:04:22

Do you know what databases clojure people use? i looked at state of clojure 2021 and there isnt such question

respatialized00:04:08

I think often the datalog-based DBs like Datomic and now Datahike/XTDB/Asami are much more aligned with Clojure than Mongo is; you get the flexibility of a document DB with better indexing, more leverage, and stronger consistency. I think the people who'd be considering a "nosql" DB might choose one of those DBs instead. A lot of people just use postgres for Clojure projects too AFAIK.

Takis_01:04:19

i liked prolog, but then i found clojure, and i realized that i like functional programming more than logic programming, datalog looks like prolog/sparql alot , i dont say its less good i am far from expert to judge this, but doesnt look like clojure, in contrast mongodb query language looks like clojure

Max01:04:44

I suspect the Clojure community tends to be more interested in triplestores because Rich promoted ideas about them and built one (diatomic). It's worth watching his talks on the subject even if just to understand why everyone is excited about them

Takis_01:04:27

i will and i will check the above databases also , i like triplestores also, but still document ones looks more natural to me

hiredman00:04:01

Strong preference for something relational, we use MySQL at work (which has been surprisingly good, just annoying charset stuff mostly), and for personal stuff around the house I use postgres. I've used a few other database like things, but nothing has survived going to production as well, sometimes due to technical, sometimes organizational issues

Max01:04:34

I think in the previous discussion on Mongo people highlighted that mostly they didn't like it not because of its syntax, but because of its operational characteristics. Looking nice doesn't get you anywhere if it drops your data on the floor.

Takis_01:04:30

well i am curious for both comparisons why mongodb in general, and why mongodb with clojure, for the first i dont know, but for the second i think clojure+mongodb are natural fit

dvingo03:04:09

For why not mongo, I'd recommend the book Designing Data Intensive Apps (https://dataintensive.net) - chapter 2 is relevant here. the short of it: mongo gives you essentially zero query power. I would recommend XTDB (https://xtdb.com/) if you're used to mongo - it stores documents but gives you an actual query language with formalism behind it (https://www.academia.edu/es/6841535/What_you_Always_Wanted_to_Know_About_Datalog_And_Never_Dared_to_Ask)

Takis_04:04:04

thank you i will lok at those : )

mauricio.szabo13:04:02

About MongoDB: I actually tried to use multiple times, and it's simply horrible in multiple ways. There's no such thing as "denormalize until you need normalization" - it's incredibly hard, close to impossible, to start storing documents and then decide "well, let's split these documents now" - mostly because it's hard to know if a sub-document is a new entity or should be reused between other entities. MongoDB, by default, loses data and it's not that reliable. You need to tune it to be reliable. Compare that to PostgreSQL, that you need to tune to be faster (and it rarely loses data), for example... Also: PostgreSQL (and SQL databases, with the notable exception of MySQL) are really fast indeed. Most people say "joins are bad" but honestly, they are only bad if it takes too much time; if it doesn't, then everything works fine too. SQL (pure SQL, not stored procedures) are turing-complete (and indeed I was able to make parts of the advent of code 2021 in PostgreSQL), so you probably don't need stored procedures (with the exception of Oracle ones, they are not that interesting - it's better to keep the code on your programming language most of the time IMHO)

☝️ 3
isak01:04:17

I think MongoDB appeals to people who want something that is easy to get going with, but that is probably one of the least valued aspects of a technology in the Clojure community. Robustness would be ranked way higher, which is where MongoDB doesn't have the best reputation.

5
Max02:04:02

See: easy vs simple

adi04:04:36

@takis_ IMHO your original post basically explains an important reason. Clojure's own data processing ability is so good, that a NoSQL db doesn't have much to offer in that department via query capability. It turns out it is nicer to get data efficiently from an RDBMs and do any post-processing in-memory. Further, as others have suggested, Mongo becomes even less attractive due to poor operational characteristics and the difficulty of extracting data guarantees when compared to bog-standard RDBMSes. Then there is the marketing of "Scale". DBs like Maria or PG can competently handle multi-terabyte-size stores with hundreds of billions of records on single nodes. That basically covers 99% of any software system out there. Even SQLite has insanely large limits https://sqlite.org/limits.html

Wanja Hentze09:04:46

There seems to be an assumption by many folks that because sqlite can easily scale down pretty far, even to embedded devices, it must not be able to scale up very well. I think that assumption is mostly wrong, I've never reached sqlite's actual limits yet. Also, tailscale recently blogged about how their entire production DB is a single sqlite instance: https://tailscale.com/blog/database-for-2022/ Now of course, while it can scale up, sqlite can't scale out, so if that's what you need you must look elsewhere.

👀 1
metal 1
adi10:04:43

TIL: litestream. Thanks!

adi10:04:57

> sqlite can't scale out, so if that's what you need you must look elsewhere Just thinking aloud. To a good first approximation, 20% of a significant-enough customer base may need 1 SQLite / customer. The remaining 80% would fit into a common SQLite. A scale-out strategy could simply be to do application-level load-balancing and send per-customer traffic to a dedicated node + db instance. Some sane namespacing scheme could help ensure writes lay out in S3 WAL backup, symmetric to traffic routing. And if I'm thinking straight, forking a customer to their own instance will probably not be complicated (likely zero-downtime, or maybe a tiny downtime).

adi10:04:26

This sort of per-customer, ah, sharding, could make migrations less risky too. Do it on a low-risk DB and then roll it out to higher-risk DBs.

Wanja Hentze14:04:09

> A scale-out strategy could simply be to do application-level load-balancing and send per-customer traffic to a dedicated node + db instance. This sounds simple, but as soon as you go from 1 DB to several, you immediately lose all the nice ACID guarantees that sqlite works so very, very hard to uphold. You can then either pile a ton of more engineering on top to try to win those back, or you can take calculated tradeoffs and accept some inconsistency. Neither of these is a call I'd make lightly though.

adi14:04:29

Hm, if one were to map a customer to a db, and always write that customer's data to that db, isn't it just single-tenancy? I'm not sure how that breaks ACID guarantees. But yes, once in the details, nothing is simple. Or easy for that matter. 😅

seancorfield05:04:56

@takis_ I was a maintainer of CongoMongo for quite a while -- a MongoDB wrapper library for Clojure -- and we went fairly "all-in" on MongoDB at work for a while because the document <-> data structure mapping seemed very natural but as we looked at scaling MongoDB, it was going to cost as much or more than the relational setup we had with MySQL and it wasn't a good fit for all of our data needs so we were going to need MongoDB as well as MySQL and that just was a tenable position. As others have noted, the query language is pretty limited and the lack of joins and lack of transactions etc made it very much a "second cousin" to RDBMs (at the time -- I don't know whether it's improved much since). We ended up migrating all of our MongoDB data back to MySQL and scrapping all our MongoDB plans. In addition, I was spending a lot of time in the MongoDB community, attending conferences, talking to "web scale" users, etc and I just kept hearing all sorts of horror stories about robustness and the ops side of living with MongoDB and that also contributed to our decision to back away from it.

Takis_11:04:01

i know i have seeen CongoMongo, and one of the pdf created maybe for some presentation,saying that mongodb query language is like clojure maps,was the inspiration to make cMQL for example https://cmql.org/playmongo/?q=622b3f209096ce67bc69675c , i might try elastic search also and see honey sql also. Mongo now has join and transactions now, but i dont know how it still compares with rdbs

slipset05:04:56

We're currently migrating from mongo to pg. The reasons why can be summarized with a paraphrase of Greenspuns tenth: Any sufficiently complicated program using MongoDb contains an ad hoc, informally-specified, bug-ridden, slow implementation of half of Postgres.

👍 5
adi07:04:48

Hah! fellow person of culture :) Same sentiment re: greenspun's tenth... https://clojurians.slack.com/archives/C03RZGPG3/p1648700988630759

slipset07:04:46

Might have been where I picked it up 🙂

slipset07:04:20

Sorry about the lack of attribution, I try to remember, but sometimes I forget 🙂

adi08:04:17

Ah all text here is MIT licensed 😁

adi08:04:03

Likewise, I have no idea where that adaptation came from. Maybe I actually read it somewhere (likely) but my brain thinks I wrote it first (very unlikely) 🤪

didibus06:04:13

I really don't know about Mongo, but DynamoDB and S3 are both pretty great, and they work well with Clojure, put EDN in them, get EDN back. At least when you want to work a the document level. For cross document queries, you got to index them in ElasticSearch or something like that, as that's kind of where they're not as fully featured.

gklijs07:04:32

Since we're all talking about databases. Event stores could also be pretty nice. I'm biased, but since you can choose the serializer yourself, Axon Server should also work pretty nice with Clojure, or things like Datomic and xtdb off course.

slipset07:04:39

There is a case, I believe for a document store when you’re prototyping, It’s really nice to be able to just stuff whatever into the database and retrieve it again. Your route tree could look like:

["/api/:collection" 
   ["" {:get {:handler find-all} 
        :post {:handler create}}] 
   [":id" {:get {:handler get-by-id}}
           :put {:handler update-by-id} 
           :delete {:handler delete-by-id}}]]

Mno07:04:30

Redis + watchers on atoms has been my go to for small bots and scripts with persistance. It's comfortable, if nothing else.

dgb2308:04:24

Replaying events doesn’t free you from piping them through relational logic though.

Takis_12:04:14

i see now honeySQL and looks nice makes SQL more Clojure like , but what about the procedural sql extentions like pl/pgsql of postgess or pl/sql of oracle, do we have a way to make them Clojure like also?

p-himik12:04:06

Unless you need to write dynamic pl/pgsql, I would just write it in regular text. But if you do need for it to be dynamic, HoneySQL is extensible - you can easily add your own constructs to it.

Takis_12:04:35

i mean for example this https://www.postgresql.org/docs/current/plpgsql-control-structures.html do we have a clojure like way to write loops/if/declare variables/functions etc those things that pl/pgsql offers

Takis_12:04:46

is there a way to generate pl/pgsql code from clojure like code? the way honeysql does for sql?

p-himik12:04:38

Just as I said - I don't think anything exists, because from both my experience and my observations writing dynamic pl/pgsql is almost never needed, but you can extend HoneySQL yourself, from your user code, to support it.

Takis_12:04:06

oh ok thank you

Nundrum13:04:32

Sometimes I miss R-style data frames. Sometimes 😉

p-himik13:04:37

Curious - doesn't stuff like https://scicloj.github.io/tablecloth/index.html have all the useful features of R data frames?

⬆️ 1
👍 1
respatialized13:04:18

And a dplyr inspired API!

Nundrum13:04:45

That was more a comment on the database thread. But I didn't know about tablecloth! It's not findable with terms like "clojure tabular data" or "clojure data frame".

😬 1
respatialized14:04:17

Those of us in the #data-science streams clearly have more work to do to promote the discoverability of the libraries we use 😅

😄 1
p-himik14:04:25

I guess I just happen to be more exposed to it all by accident. :) In any case, I'd definitely recommend checking out https://scicloj.github.io/ if you have to work with data in scenarios where you'd use R's or Pandas' data frames.

Nundrum14:04:38

Three years ago I was doing a ton of Clojure with big data / Hadoop / etc. But none of that is on my plate currently so that's probably why I don't know of it.

genmeblog14:04:44

The main reason I think is that most of discussions around data-science happens on Zulip.

Daniel Slutsky14:04:49

Let us think what would make things more visible here. 🙏

genmeblog14:04:13

anyway @U02UHTG2YH5 glad to see that dreams come true so quickly 😄

👍 2
Nundrum14:04:14

I think it would be useful if the Tablecloth page linked above contained the term "data frame" at the very list

👍 1
1
noisesmith20:04:49

if this post has zero characters, or the posts form a loop, I will stop, otherwise I will post again with the count of the previous post

noisesmith20:04:03

if this post has one hundred and thirty six characters, or the posts form a loop, I will stop, otherwise I will post again with the count of the previous post

noisesmith20:04:17

if this post has one hundred and fifty eight characters, or the posts form a loop, I will stop, otherwise I will post again with the count of the previous post

noisesmith20:04:25

if this post has one hundred and fifty nine characters, or the posts form a loop, I will stop, otherwise I will post again with the count of the previous post

noisesmith20:04:38

if this post has one hundred and fifty eight characters, or the posts form a loop, I will stop, otherwise I will post again with the count of the previous post

Nundrum20:04:33

Is this a perverse form of the Collatz conjecture?

😄 1
noisesmith20:04:02

it's a variation on an exercise a teacher used to demonstrate self-reference and recursion to a non-mathematical audience

noisesmith20:04:04

it works with a wide variety of framing sentences, and I assume it would work in most written languages

dpsutton22:04:31

This is very Gary Fredericks to me. (great twitter follow if this is your jam)

noisesmith23:04:53

I knew him on #clojure freenode irc back in the day

Oliver George23:04:00

How can I generate keyword statistics (frequency of keyword use) out of my CLJS codebase? Hard bit is reading CLJS as data I think.

Oliver George23:04:22

Suspect rewrite-clj zipper used to walk will work. Starting there...

👍 1
mauricio.szabo13:04:02

About MongoDB: I actually tried to use multiple times, and it's simply horrible in multiple ways. There's no such thing as "denormalize until you need normalization" - it's incredibly hard, close to impossible, to start storing documents and then decide "well, let's split these documents now" - mostly because it's hard to know if a sub-document is a new entity or should be reused between other entities. MongoDB, by default, loses data and it's not that reliable. You need to tune it to be reliable. Compare that to PostgreSQL, that you need to tune to be faster (and it rarely loses data), for example... Also: PostgreSQL (and SQL databases, with the notable exception of MySQL) are really fast indeed. Most people say "joins are bad" but honestly, they are only bad if it takes too much time; if it doesn't, then everything works fine too. SQL (pure SQL, not stored procedures) are turing-complete (and indeed I was able to make parts of the advent of code 2021 in PostgreSQL), so you probably don't need stored procedures (with the exception of Oracle ones, they are not that interesting - it's better to keep the code on your programming language most of the time IMHO)

☝️ 3