This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-06-11
Channels
- # announcements (4)
- # aws (6)
- # babashka (40)
- # beginners (318)
- # biff (4)
- # bootstrapped-cljs (9)
- # calva (19)
- # chlorine-clover (1)
- # cider (3)
- # clj-on-windows (25)
- # cljdoc (8)
- # cljfx (1)
- # cljs-dev (30)
- # cljss (2)
- # clojure (62)
- # clojure-chile (9)
- # clojure-europe (11)
- # clojure-finland (17)
- # clojure-italy (1)
- # clojure-kc (1)
- # clojure-nl (3)
- # clojure-spec (27)
- # clojure-uk (40)
- # clojuremn (1)
- # clojurescript (51)
- # conjure (6)
- # cursive (8)
- # data-science (9)
- # datahike (4)
- # datascript (1)
- # datomic (31)
- # emacs (10)
- # emotion-cljs (1)
- # events (1)
- # figwheel-main (16)
- # find-my-lib (1)
- # fulcro (30)
- # graalvm (3)
- # graphql (12)
- # helix (16)
- # honeysql (5)
- # jobs (1)
- # jobs-discuss (10)
- # juxt (3)
- # kaocha (26)
- # lambdaisland (3)
- # leiningen (15)
- # malli (7)
- # off-topic (100)
- # pathom (8)
- # pedestal (15)
- # protojure (24)
- # re-frame (2)
- # reagent (7)
- # reitit (22)
- # remote-jobs (1)
- # shadow-cljs (140)
- # spacemacs (17)
- # spire (2)
- # tools-deps (23)
- # uix (11)
- # vim (5)
- # xtdb (3)
- # yada (3)
This seems pretty much like what i was looking for in terms of filesystem in dev and actual cloud in production
but if you scroll down, even though it really is just a java library, there is documentation for using it with clojure 1.3
which is odd on its own to call out clojure like that, but i'm more concerned that it might not work now or in the future because of a lack of attention from maintainers
i don't really mind cloud lock in rn, but i do want a local testing solution without having to whip up some protocols for myself
Oh, that brings back memories -- I remember looking at jClouds years ago! It's interesting how we view stability in Clojure vs lack of maintenance in other tech.
I have a friend i needed to fight tooth and nail to get to not debug his internal app live on ec2 every time
a big reason he did that, in the realm of custom internal app, was that he used s3 to store images
which feels like a crazy thing to have to write, since its just "dump blob, load blob"
and i know from recent experience that, for all its flaws, rails active storage solves this
local:
service: Disk
root: <%= Rails.root.join("storage") %>
test:
service: Disk
root: <%= Rails.root.join("tmp/storage") %>
amazon:
service: S3
access_key_id: ""
secret_access_key: ""
bucket: ""
region: "" # e.g. 'us-east-1'
@emccue when I looked into this last, there's java libraries which can start an s3 compatible server
There is localstack - https://github.com/localstack/localstack
(ns ardoq.gateway.storage)
(defprotocol IStorage
(exists? [this request])
(copy-object [this request])
(get-object [this request])
(put-object [this request])
(delete-object [this request])
(delete-objects [this request]))
Anyone else find those images of the source code history compelling from History of Programming Languages / Clojure ? https://download.clojure.org/papers/clojure-hopl-iv-final.pdf
there are some tools to make them (I made those)
https://erikbern.com/2016/12/05/the-half-life-of-code.html was an early one
and then there's a newer thing which I actually used (trying to find it)
I did a fair amount of hacking to get it to make exactly the graphs that Rich wanted but it's fun to play with on any repo
That is really cool! I was wondering how Rich got those graphs 🙂 I am collecting a lot of data for next quarter to try and get Clojure approved for use in my company, this will help a lot!
I keep coming back to https://www.reddit.com/r/Clojure/comments/gendnb/complete_technical_invalidation_of_rich_hickeys/ https://www.reddit.com/r/haskell/comments/763su7/in_which_rich_hickey_questions_the_value_of/ when I try to explain to a friend that “programs deal with data, and data is dynamic. Learn to deal with data not types.”
Arguing about dynamic vs static typing in the context of a single program is a red herring unless you only build systems that don't communicate with any other components.
The more components that are added to a system that don't use the same type system the less return on investment for using types there is. OTOH you don't see that when focusing on data.
@alexmiller That's great that the blog entry uses the name the Ship of Theseus, I was using this analogy just yesterday to explain the code churn/turn over ... at some point all the boards and planks of Theseus' ship have been exchanged for new ones... where is the ship?!
Types are about proving things. What do you do when the world changes?
Does anyone have any suggestions on how to shard/distribute large quantities of data stored on disk across a fleet of hosts?
The individual ‘chunks’ (files/folders) of data are small and independent (i.e. there’s no need to coordinate between hosts).
…so the main concerns would be how to fairly distribute the chunks (each chunk should be stored redundantly for HA), routing requests for a given chunk to the correct host, and being able to handle added capacity as the data grows.
there’s probably a dozen key-value stores and cloud services like s3 that would seem to fit this use case. any reason not to use one of these options?
yeah, I need the data to be on disk
as in, there’s a service that operates on the data that requires it to exist locally on disk
that might be a reason, but I’m still not convinced. what’s the service do? data transformation, service requests? how does it compare to something like hadoop, storm, or samsa?
well, basically it’s a legacy technology that I can’t modify, that performs operations of hundreds to millions of file on disk
pretty much all it does is serve requests for performing one out of a list of operations on a single data chunk (folder)
it differs from those platforms you mentioned in that it’s not really programmable in any way
interesting. not sure I can think of something like that off the top of my head
alright, yeah
it seems to me like there should be pre-packaged solutions to this problem, but I can’t really think of or find any either
like, there are distributed file systems like Gluster, but using something like that seems a bit overkill to me
really all I want is something that’ll help me distribute chunks of data across hosts as well as route requests to whichever host has the relevant chunk
yea, all the stuff that I can think of also wants to have control over the file format
so key value stores will do that, but they will store the data their own way
will it be running in the cloud?
the other option I was thinking about was to just buy a large RAID disk and circumvent the distribution issues
there might be some cloud equivalent as well
is the legacy system also going to be installed on each server in the cluster?
yeah the plan is to run it in the cloud, but I’d like to keep the door open to running it on bare metal as well
and depending on the individual file sizes and latency requirements, it might still be feasible to store everything in s3 and just stream the data that you need when you need it
hmm, it seems like you could try the RAID approach on aws, https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/raid-config.html. just have one server with a giant RAID disk
Well, the lower latency the better, really, so I don't think S3 would be appropriate really. I suppose the RAID approach works, but it sort of feels like I'd be building a SPOF/bottleneck for something that's trivially distributable.
I guess I could just build my own centralized routing service with a DB containing the chunk -> host mapping
agreed. there’s a lot of different tradeoffs and it all depends on the use case. a distributed system is inherently more complicated than a non-distributed system. it’s just a matter if that complexity is worth it (which it might be). having a single server with a backup ready to go would be more straightforward to build and maintain, but it’s not as elastic.
the centralized routing service doesn’t sound too bad, but it’s easy to end up spending a lot of time on incidental complexity anytime you build a distributed system
Yeah. I think the tricky part with that approach would be how to deal with disk space running out on the hosts
There'd have to be some mechanism for moving chunks to new hosts when that happens
is the total file storage requirement growing over time?
Yeah, as well as the individual chunks
ohhh, ok
the fact that the individual files are growing also reduces your options
what’s the order of magnitude for the total storage space requirement?
Not quite sure yet. At the time being it'd be terabytes
ah ok. single server does not seem appropriate for that
this has a ton of caveats, but if the legacy software reads the whole file, does work, and dumps a whole file
I'd say try profiling the s3 solution - the amount of lifeforce you would save in operations is worth seeing it through
under limitations: > random writes or appends to files require rewriting the entire object, optimized with multi-part upload copy this seems like it might be a deal breaker
Oops sorry missed the continuation of this thread. Thanks for the recommendation! I’ll check it out, but I have a hunch that it’s not going to be a good fit.
does anyone know if there's an edn or clojure mode for BBEdit? Trying to have our ops people not have to chuck their editor and adapt to their workflow
@dpsutton Maybe start here http://bbeditextras.org/wiki/index.php?title=Using_BBEdit_and_(your_language)#Javascript.2FJSON
Yeah, there's an AutoLisp mode available too (third party download).
Aye, I used BBEdit for years...
though that is subject to how many ops people you have and how costly a switch would be for each individual