Fork me on GitHub

About what sort of hardware requirements make sense at different scales for Biff? I have very little experience with any sort of scaling, but I felt like the $5 Droplet-esque sizes were a little slow given the JVM and XT. After poking around, I heard that JVM apps don't like anything lower than 4GB of RAM, so I set up my Biff project on a $30/mo Linode 4GB, which felt really nice and snappy, but Akamai just announced a pricing increase (and is generally looking less friendly to indie dev), so I'm looking for a new home. (As a side note, if anyone knows of a VPS provider that lets you ship Alpine 3.17, I would also love to know, since it's all just Debian and Ubuntu everywhere I've seen.)

Jacob O'Bryant22:03:33

I generally start out with 1GB of memory and scale up from there as needed. Less than that and I run out of memory before the app even starts up. my main apps currently run on a mix of 4gb and 8gb instances, though that's mostly because I'm doing some memory-intensive/dumb stuff

👍 2
Jacob O'Bryant22:03:49

I think digitalocean lets you use a custom image -- a quick search comes up with this:


I was especially curious about The Sample (since AFAICT there's an ML component to that application?), but that was more intellectual curiosity since there's no way I'm close to that size anytime soon 😅


I'm also trying to evaluate if it's worth setting up the Postgres instance, finally--I've been leaving the topology as standalone, since it was my intuition that local filesystem access must be faster than adding a network layer and a whole separate machine, so long as I back up the entire database via rsync regularly. But maybe there's a huge positive to setting up Postgres that I'm not seeing (other that it being well-known and theoretically infinitely scalable)

Michael W22:03:00

It's really for horizontal scaling. If you need to scale up the app, you can put the kv and doc store on postgres and the transaction log on kafka, then spin up as many app nodes as you need to handle the load.


That makes sense! For now I suppose I'll likely stick with vertical storage scaling until it becomes a pain point.

Jacob O'Bryant23:03:42

As an intermediate step before kafka you can also put tx log + doc store on postgres while keeping index on the filesystem; biff does that if you set :biff.xtdb/topology :jdbc (and include the necessary jdbc config). But yeah as long as you're running on just one machine, filesystem + backups is fine. I use managed postgres for the sample, partially because it has two machines (a web server and a worker) and partially for the automated backups. The web server and worker have the same code and config, they just have different systemd service config which sets BIFF_ENV to web or worker respectively, which correspond to different sections in config.edn. The ML stuff is the world's jankiest daily cron job* which spits out a training.csv file, feeds it to an off-the-shelf python lib, uses the python code to help generate a bunch of recommendations for all the users, then saves the recommendations to XT + sends them via email. It's extremely messy, but I've worked out the kinks so it's quite reliable. \*ok, there is some pretty janky stuff out there, I probably don't have the record

Jacob O'Bryant23:03:46

I'm still using filesystem for tx log + doc store on Yakread. I'd like to get over to managed postgres soon but have other stuff that's higher priority. I just barely started using digitalocean's S3 clone for object storage instead of the filesystem, partially at least...


Good to know--I've been delaying implementing object storage on my project at all for as long as I can (even though it's a must-have before I try to promote it anywhere seriously), mostly because I haven't the slightest idea where to begin other than looking at Platypub's source. Starting with DO's clone is a good lead. Thanks so much for your help! I'm glad that it looks like I can put off learning AWS for another day 😉

Jacob O'Bryant05:03:59

> I'm glad that it looks like I can put off learning AWS for another day yeah, I prefer not to touch aws with a 10 foot pole take this utility function, then you can GET and PUT stuff to S3/Spaces without needing to pull in amazonica or the cognitect aws client (especially given (I'm planning to move that into Biff at some point)