Fork me on GitHub
#off-topic
<
2020-06-11
>
emccue06:06:12

I don't really know how to assess the reliability/maintainedness of apache projects

emccue06:06:37

This seems pretty much like what i was looking for in terms of filesystem in dev and actual cloud in production

emccue06:06:58

but if you scroll down, even though it really is just a java library, there is documentation for using it with clojure 1.3

emccue06:06:34

which is odd on its own to call out clojure like that, but i'm more concerned that it might not work now or in the future because of a lack of attention from maintainers

emccue06:06:23

i don't really mind cloud lock in rn, but i do want a local testing solution without having to whip up some protocols for myself

seancorfield06:06:23

Oh, that brings back memories -- I remember looking at jClouds years ago! It's interesting how we view stability in Clojure vs lack of maintenance in other tech.

emccue06:06:55

I have a friend i needed to fight tooth and nail to get to not debug his internal app live on ec2 every time

emccue06:06:20

a big reason he did that, in the realm of custom internal app, was that he used s3 to store images

emccue06:06:04

and didn't write a custom interface to do it on the filesystem

emccue06:06:33

which feels like a crazy thing to have to write, since its just "dump blob, load blob"

emccue06:06:48

and i know from recent experience that, for all its flaws, rails active storage solves this

emccue06:06:10

so I'm looking around for something canned

emccue06:06:22

local:
  service: Disk
  root: <%= Rails.root.join("storage") %>

test:
  service: Disk
  root: <%= Rails.root.join("tmp/storage") %>

amazon:
  service: S3
  access_key_id: ""
  secret_access_key: ""
  bucket: ""
  region: "" # e.g. 'us-east-1'

dominicm06:06:27

@emccue when I looked into this last, there's java libraries which can start an s3 compatible server

emccue06:06:37

I found one that was made to make your "own cloud" kind of thing

emccue06:06:52

but that seemed to miss the mark

emccue07:06:27

this seems to be the closest

slipset08:06:16

We have homegrown solution to this:

slipset08:06:55

(ns ardoq.gateway.storage)

(defprotocol IStorage
  (exists? [this request])
  (copy-object [this request])
  (get-object [this request])
  (put-object [this request])
  (delete-object [this request])
  (delete-objects [this request]))

slipset08:06:20

And then we have two implementations, one for S3 and one for localstorage.

slipset08:06:58

But our implementation leaks s3 details, but that's another problem 🙂

slipset08:06:56

hmm, maybe make a lib around this ?

sova-soars-the-sora13:06:06

Anyone else find those images of the source code history compelling from History of Programming Languages / Clojure ? https://download.clojure.org/papers/clojure-hopl-iv-final.pdf

Alex Miller (Clojure team)13:06:19

there are some tools to make them (I made those)

Alex Miller (Clojure team)13:06:46

and then there's a newer thing which I actually used (trying to find it)

Alex Miller (Clojure team)13:06:54

I did a fair amount of hacking to get it to make exactly the graphs that Rich wanted but it's fun to play with on any repo

Daniils Petrovs13:06:07

That is really cool! I was wondering how Rich got those graphs 🙂 I am collecting a lot of data for next quarter to try and get Clojure approved for use in my company, this will help a lot!

jjttjj14:06:07

Is there a good tool for renaming a namespace across a project in clojurescript?

👀 3
borkdude14:06:18

maybe clojure-lsp can do it? or Cursive

3
👍 3
borkdude14:06:37

personally I use projectile-replace

Daniel Tan15:06:24

I keep coming back to https://www.reddit.com/r/Clojure/comments/gendnb/complete_technical_invalidation_of_rich_hickeys/ https://www.reddit.com/r/haskell/comments/763su7/in_which_rich_hickey_questions_the_value_of/ when I try to explain to a friend that “programs deal with data, and data is dynamic. Learn to deal with data not types.”

mloughlin17:06:31

Arguing about dynamic vs static typing in the context of a single program is a red herring unless you only build systems that don't communicate with any other components.

mloughlin17:06:08

The more components that are added to a system that don't use the same type system the less return on investment for using types there is. OTOH you don't see that when focusing on data.

sova-soars-the-sora16:06:06

@alexmiller That's great that the blog entry uses the name the Ship of Theseus, I was using this analogy just yesterday to explain the code churn/turn over ... at some point all the boards and planks of Theseus' ship have been exchanged for new ones... where is the ship?!

Alex Miller (Clojure team)16:06:59

Types are about proving things. What do you do when the world changes?

3
☝️ 3
wombawomba16:06:18

Does anyone have any suggestions on how to shard/distribute large quantities of data stored on disk across a fleet of hosts?

wombawomba16:06:51

The individual ‘chunks’ (files/folders) of data are small and independent (i.e. there’s no need to coordinate between hosts).

wombawomba16:06:05

…so the main concerns would be how to fairly distribute the chunks (each chunk should be stored redundantly for HA), routing requests for a given chunk to the correct host, and being able to handle added capacity as the data grows.

phronmophobic16:06:14

there’s probably a dozen key-value stores and cloud services like s3 that would seem to fit this use case. any reason not to use one of these options?

wombawomba16:06:56

yeah, I need the data to be on disk

wombawomba16:06:26

as in, there’s a service that operates on the data that requires it to exist locally on disk

phronmophobic17:06:57

that might be a reason, but I’m still not convinced. what’s the service do? data transformation, service requests? how does it compare to something like hadoop, storm, or samsa?

wombawomba17:06:56

well, basically it’s a legacy technology that I can’t modify, that performs operations of hundreds to millions of file on disk

wombawomba17:06:42

pretty much all it does is serve requests for performing one out of a list of operations on a single data chunk (folder)

wombawomba17:06:17

it differs from those platforms you mentioned in that it’s not really programmable in any way

phronmophobic17:06:46

interesting. not sure I can think of something like that off the top of my head

wombawomba17:06:41

alright, yeah

wombawomba17:06:17

it seems to me like there should be pre-packaged solutions to this problem, but I can’t really think of or find any either

wombawomba17:06:26

like, there are distributed file systems like Gluster, but using something like that seems a bit overkill to me

wombawomba17:06:11

really all I want is something that’ll help me distribute chunks of data across hosts as well as route requests to whichever host has the relevant chunk

phronmophobic17:06:35

yea, all the stuff that I can think of also wants to have control over the file format

phronmophobic17:06:18

so key value stores will do that, but they will store the data their own way

phronmophobic17:06:29

will it be running in the cloud?

phronmophobic17:06:02

the other option I was thinking about was to just buy a large RAID disk and circumvent the distribution issues

phronmophobic17:06:31

there might be some cloud equivalent as well

phronmophobic17:06:38

is the legacy system also going to be installed on each server in the cluster?

wombawomba17:06:43

yeah the plan is to run it in the cloud, but I’d like to keep the door open to running it on bare metal as well

phronmophobic17:06:40

and depending on the individual file sizes and latency requirements, it might still be feasible to store everything in s3 and just stream the data that you need when you need it

phronmophobic17:06:07

hmm, it seems like you could try the RAID approach on aws, https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html. just have one server with a giant RAID disk

wombawomba17:06:50

Well, the lower latency the better, really, so I don't think S3 would be appropriate really. I suppose the RAID approach works, but it sort of feels like I'd be building a SPOF/bottleneck for something that's trivially distributable.

wombawomba17:06:26

I guess I could just build my own centralized routing service with a DB containing the chunk -> host mapping

phronmophobic17:06:19

agreed. there’s a lot of different tradeoffs and it all depends on the use case. a distributed system is inherently more complicated than a non-distributed system. it’s just a matter if that complexity is worth it (which it might be). having a single server with a backup ready to go would be more straightforward to build and maintain, but it’s not as elastic.

phronmophobic17:06:55

the centralized routing service doesn’t sound too bad, but it’s easy to end up spending a lot of time on incidental complexity anytime you build a distributed system

wombawomba17:06:43

Yeah. I think the tricky part with that approach would be how to deal with disk space running out on the hosts

wombawomba17:06:36

There'd have to be some mechanism for moving chunks to new hosts when that happens

phronmophobic17:06:40

is the total file storage requirement growing over time?

wombawomba17:06:54

Yeah, as well as the individual chunks

phronmophobic17:06:29

the fact that the individual files are growing also reduces your options

phronmophobic17:06:08

what’s the order of magnitude for the total storage space requirement?

wombawomba17:06:44

Not quite sure yet. At the time being it'd be terabytes

phronmophobic17:06:01

ah ok. single server does not seem appropriate for that

emccue20:06:35

Maybe a dumb answer but

emccue20:06:12

this has a ton of caveats, but if the legacy software reads the whole file, does work, and dumps a whole file

emccue20:06:21

then this will work just fine

emccue20:06:13

I'd say try profiling the s3 solution - the amount of lifeforce you would save in operations is worth seeing it through

phronmophobic21:06:33

under limitations: > random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy this seems like it might be a deal breaker

wombawomba22:06:13

Oops sorry missed the continuation of this thread. Thanks for the recommendation! I’ll check it out, but I have a hunch that it’s not going to be a good fit.

dpsutton20:06:47

does anyone know if there's an edn or clojure mode for BBEdit? Trying to have our ops people not have to chuck their editor and adapt to their workflow

dpsutton21:06:25

ha scheme but no clojure 🙂

seancorfield21:06:43

Yeah, there's an AutoLisp mode available too (third party download).

dpsutton21:06:00

past its heydey. but what a champ it was

seancorfield21:06:18

Aye, I used BBEdit for years...

emccue21:06:30

probably cheaper in terms of time investment to write one yourself for the ops people

emccue21:06:26

though that is subject to how many ops people you have and how costly a switch would be for each individual

borkdude21:06:27

had a pretty good experience with codemirror + clojure + parinfer support

Cory22:06:27

as a person entering an ops role at a clojure shop but having a background in FP i'm actually super interested in babashka right now

Cory22:06:24

i'm definitely more in the school of "enable the dev to help you operate" though.