clojure-uk 2020-03-29 | Slack Archive

Ah @seancorfield, you might be the best person to ask this to :)… I’m wondering, and this might be a really silly idea, but if I was trying to run a simulation where the I wanted to keep the sim state pretty much entirely in a postgres database, what’s the least crazy way you’d approach that? Besides jumping straight into datomic… I’d prefer to not have to hand write the entire thing in sql statements… Is java-jdbc too low level? So I should be looking at HoneySQL or SQLingvo? I’ve also been thinking about maybe taking a look at how datascript does things and just using that+java-jdbc. You’re welcome to tell me ahead of time that this is going to end in tears and fire =)…

folcon20:03:47

I was having a chat with a friend of mine and this came up as something he wanted to try, I’m trying to work out if it’s genius or madness… :face_with_raised_eyebrow:

seancorfield20:03:31

I'm actually a big fan of SQL and using relational databases... If your data is "just" hash maps and collections of hash maps, that maps nicely to rows and tables, IMO, and next.jdbc is a good way to get that data in and out of the database.

seancorfield20:03:59

But it depends on the shape of your model and how complex the relationships are between parts of the data model.

folcon20:03:38

Well mostly hashmaps, there’s things like nested maps:

#wagons_and_wizards.sim.House{:peeps [#wagons_and_wizards.sim.Peep{:name "Peep 6",
                                                                   :cash 0,
                                                                   :holdings [],
                                                                   :gender :female,
                                                                   :id 152,
                                                                   :dob {:days -6, :months -2, :years -13}}
                                      ...],
                              :pantry {:food-base {"Wheat Bread" {:name "Wheat Bread",
                                                                  :category :food-base,
                                                                  :portions 121}},
                                       :food-extra {"Eggs" {:name "Eggs", :category :food-extra, :portions 135}},
                                       :drink-base {"Cheap Ale" {:name "Cheap Ale", :category :drink-base, :portions 33},
                                                    "Good Wine" {:name "Good Wine",
                                                                 :category :drink-base,
                                                                 :portions 128}}},
                              :day-of-food {:food-base 9, :food-extra 9, :drink-base 9},
                              :normal-shop {:food-base 126, :food-extra 126, :drink-base 126},
                              :discretionary-funds 5350,
                              :id 153,
                              :dob {:days 0, :months 0, :years 0}}

folcon20:03:41

Basically he pointed out that I could be sticking all of that in postgres, and then I wouldn’t be realising most of the sim in memory… I could then just query/update the bits that I wanted to and not deal with a bunch of extra complexity… However I’m concerned that I’d be drowning in complexity from dealing with sql instead of update/assoc etc…

folcon20:03:31

Trying to work out how to rewrite updates in sql is… 😖

seancorfield20:03:48

You could either create a normalized (relational) model of that nested structure and just have one place where you save it down to the database/load it back up from the database -- we do that in a few places at work. Or you could treat some portion of it just as JSON and use native support in PostgreSQL.

seancorfield20:03:15

A lot depends on what you need the lifecycle of the load/save points to be -- are you just looking to keep a record of sim states in the DB, i.e., just using it as a persistence layer -- or whether you want to perform complex queries and updates directly against the DB.

folcon20:03:02

The latter I think?

seancorfield20:03:41

We have some data structures at work where the DB representation is just a couple of specific columns and then a JSON blob of the whole structure. That's possible because we only query on a few fields, and otherwise do everything in-memory, just loading/saving the data for persistence.

folcon20:03:53

I can do the former, that’s not too problematic, but then there’s a question, how do I decide that sim data is not required? (What I mean by this is, I feel like I’d need to work out that some data isn’t necessary, so therefore it can be dumped from memory and just kept in the postgres db…)

folcon20:03:13

Ok so doing the actual work in-db is a bad idea I take it?

seancorfield20:03:43

It can be. Depends on the work. I'm very pragmatic about that.

seancorfield20:03:16

The more you want to query it in the DB, the more you have to normalize it, and the more work it is to load/save (& update) instances of your model.

folcon20:03:45

Hmm, it’s tricky as I’m not sure how complex the sim will get, at present on that snippet of data, what I’d be doing is changing things like, each Peep’s cash and holdings, what’s currently in the pantry by deleting the amount based on :day-of-food or adding based on :normal-shop and adjusting :discretionary-funds

seancorfield20:03:54

My gut would be to assign a unique ID to each Peep and initially treat the DB purely as an opaque data store and see how it evolves.

folcon20:03:19

It’s just values like :pantry are pretty deep. I’m probably going to have to split that House structure into, House, Peep (foreign key to house), Pantry (foreign key to house)?

seancorfield20:03:22

So you'd manage the Peep data in-memory while you're developing it.

seancorfield20:03:31

Yeah, once you have a better sense of the specific individual entities that exist independently -- and need to be persisted independently -- then you can evolve the schema and migrate the data to the new relations.

folcon20:03:52

Ok sure.

folcon20:03:23

The alternative model I was thinking of was to basically lean on datascript to flatten everything into EAVT style structures and then just store that, except in cases where I know what the structure is?

seancorfield20:03:32

There's also the option of using some sort of EAVT pattern for persistence -- essentially ... yeah

seancorfield20:03:36

We have an attribute store at work for some stuff because it's open-ended and it just has (in our case) id, user id, attribute name, attribute vaule (and a bit of metadata).

folcon20:03:07

Ok good to know that’s viable 😃

seancorfield20:03:09

Like I say, I'm pragmatic about it and happy to mix different types of representations in the same system.

folcon20:03:14

The only thing I’m concerned about is performance, I’m not going to suddenly find that this is going to be a bear >_<…

seancorfield20:03:38

How many millions of Peeps are you going to be simulating?

folcon20:03:31

Well at the moment there’s peeps, shops, stocks and traders, based on 3 systems interacting with each other.

seancorfield20:03:14

We use Percona (MySQL) and we have a number of tables with tens of millions of rows and it's still fast enough.

folcon20:03:14

Peeps buy from shops, shops become stocks based on success and traders trade them…

folcon20:03:26

Ok, that’s really good 😃

folcon20:03:39

What kind of memory do you need for that though?

folcon20:03:07

If I started off with a fairly lightweight machine to start with is that going to be a problem?

seancorfield20:03:09

Sure, we sometimes have to tune a complex query because it's not fast enough, but overall we try to keep queries simple and do stuff in Clojure if doing it in the DB would be complex/slow.

seancorfield20:03:18

I can't remember what our DB servers are these days. Last year we moved from three Big Iron servers to two virtualized servers on more modern hardware.

folcon20:03:35

Hmm, I might start with datascript / java-jdbc then to start and then try creating custom tables when I have a better idea of structure :)…

seancorfield20:03:53

Use next.jdbc, not clojure.java.jdbc.

seancorfield20:03:28

And if you need to process large amounts of data, use a reduce or transducing expression over plan.

folcon20:03:31

Actually in your experience is working with postgres json fields slower than native fields?

seancorfield20:03:44

I don't use PG so I can't comment.

folcon20:03:44

Thanks!

folcon20:03:54

Sure no problem =)…

seancorfield20:03:06

I wouldn't expect querying on JSON fields to be as fast as regular fields tho'

seancorfield20:03:35

That's why we partially normalize some data when we have otherwise opaque JSON blobs.

folcon20:03:34

Thanks for your help!

folcon20:03:05

That’s been super helpful =)…

2020-03-29

Channels