This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # 100-days-of-code (2)
- # aleph (53)
- # architecture (2)
- # aws (3)
- # beginners (230)
- # boot (15)
- # calva (3)
- # cider (19)
- # cljs-dev (1)
- # clojure (139)
- # clojure-conj (3)
- # clojure-italy (47)
- # clojure-nl (19)
- # clojure-spec (26)
- # clojure-uk (98)
- # clojurescript (152)
- # clojutre (4)
- # core-async (22)
- # cursive (5)
- # datomic (48)
- # emacs (11)
- # events (1)
- # figwheel-main (219)
- # fulcro (15)
- # instaparse (3)
- # jobs (4)
- # jobs-rus (1)
- # leiningen (30)
- # luminus (8)
- # off-topic (67)
- # onyx (5)
- # pedestal (16)
- # re-frame (1)
- # reagent (4)
- # reitit (31)
- # ring (8)
- # ring-swagger (3)
- # shadow-cljs (115)
- # specter (4)
- # videos (1)
- # vim (20)
- # yada (15)
@alex.lynham - Tired, jaded and ill-at-ease with the World and my place in it, since you asked.
Yeah, just work getting me down, and some of it's awesome by the way, I am just more homesick than usual this trip.
I think it's 'cos I've seen friends I've not seen for a long time and they have kids my age but they are home all the time, more than anything if I am honest.
That and I didn't sleep well last night (despite lying to my AirBnB host about it) and I feel a little out of phase with reality this morning.
Nothing a nice relaxing weekend in Scotland and a walk in the hills won't fix, I expect 🙂
It can be, that's for sure. I'll be honest though, @mccraigmccraig, most of the time it feels easier on my system than the daily commute from Sevenoaks used to be. Having to be on trains and Tubes and in the car for upwards of 2 hours each way every day was harder.
haven't tried. If you're not interested in SSR and just want the fulcro server ("/api") for lambdas. There you can use fulcro server or just do your own thing and use https://github.com/wilkerlucio/pathom for the parsing the queries.
I could do with having the money to stay a short walk from our London office when I come down, so that it really is only one journey down and one journey home.
@alex.lynham - Thanks 🙂 In fairness to all concerned, the work-time this trip has been awesome, and I'm not just saying that 'cos some of those people lurk in this channel, but for some reason being away from home has just been harder this week. I am sure it will pass.
Anyone got any experience with using AWS EC2 to ship in large files (15-25 GB) and then push them to S3? I am noticing that doing this on an ad hoc, manual basis I am being hit with network throttling, which I have overcome in the short-term by just killing one instance and launching a new one, but this is not a scalable solution...
Worst case scenario I could__ just acquire the files MUCH more slowly (7-9 hours instead of 15 minutes) after the throttling kicks in (assuming that the throttling is not lifted at some point), but I would rather find out if there is a reliable way to provision infra that can consistently move large files around at speed..?
Also, I need to make sure that I am not (in the fullness of time) paying network traffic fees to move files from EC2s in my Default VPC to S3 - it has been suggested to me that this might be a sleeping cost waiting to come and bite me if I don't take steps to mitigate it.
At the moment, 'cos I am doing it manually, spinning one up for >1hr I am just using t2.micro-s
The thing is, until I have this side of things automated I don't want to pay the hourly for great big, crunchy instances
I am nearly done with this brief bit of manual data shipping - the last of three files is going to S3 as I type and is nearly there - but I need to start approaching the problem of automating this whole process and I just want to be certain that the instance I deploy to will not get its network connectivity throttled
i've never noticed any network throttling... we don't generally ship files as big as 25GB but 3-5GB is common
@mccraigmccraig I did one at 20.1 and then another at 19.5 and they both shipped at an average network speed of 30Mb/s but when I kicked off the third transfer I was looking at speeds of 600kb/s instead
i don't think i've ever shipped 60GB of files continuously to S3 though, so i don't think i would have encountered any throttling after 40GB or so
some of our structured log dumps and db snapshots are of that sort of size, but they compress quite well
I have not tried compressing .grib files, but they are already a compressed binary format file, so I am not expecting that they will compress all that well. Also I am fetching them after they have been created / compiled on a third-party's system (via an API or by hand) and as such I cannot easily control their initial state. I suppose that I could push them through compression so that they are saved to disk in a compressed format, but I cannot request that they be compressed on that third-party's end.
the shuffling/serverless tools for S3 are pretty wild good these days depending on what your vector is...
other than that a small jump box that you can spin up & spin down via whatever your build/config tool is can be a useful way of moving stuff about
but like @firthh said, if you use bigger instances it will be less painful... plus if you optimise for your bottleneck (cpu/ram) you might find it's quicker & so cost is not so different
The problem (although I may be missing something) with serverless and pipelines, is that the third party I am acquiring data from is very strict about the way in which I request data and that they fulfil those requests. I have to make a request, which is queued for an indeterminate period, then it is "run" as a job, which may take hours+, and then finally once the job is complete I am given a URL that lasts for 24 hours (I think). I can't see a way of creating a serverless pipeline to make an HTTP request to that URL once it becomes available.
If I am successful in writing an automation (or series of automations) that make this all hands-free, then at least knowing that if I use a larger instance with dedicated network that I should not have the same throttling issues is kinda enough.
Can I create a Lambda that can take parameters out of the SNS message that invoked it, so that the message could include the URL to the asset and the destination on S3?
That would remove the need for an intermediary Instance that is downloading the asset, writing it to disk and then pushing it on to S3
I guess my concern is that the sheer size of the assets may make that approach costly and / or fragile, but I am more than willing to admit that that is sceptical cynicism coming from a position of ignorance. If I could be persuaded that Lambda could manage all of that without hurting, I would be VERY happy to do it that way.
If you don’t trust Lambda to do the actual processing of the large file, you could still use it as the gateway to trigger the download and upload to S3 on infrastructure you trust
e.g. lambda to start a large spot instance that will do the download of the asset and upload to S3
@firthh - Yeah, I can see that - the only risk being that the Spot Instance is killed mid-download / upload to S3
It's not that I don't trust Lambda, I am just not sure what it's actually capable of, and how much doing something like this might actually cost...
I mean basically can you get the upload point to e.g. sync to S3 directly? so you could have a set of steps - s3 sync - generate url and put in e.g. json file - json file triggers SNS - SNS uses URL
the quickest fix is just to bump the size of the EC2 instances you're currently using and/or look into relative pricing of spot ones
@firthh - Yeah, that was my understanding on Data Pipeline too, so I had kinda decided it was not the use-case that I wanted.
@alex.lynham - If Lambdas timeout at 5 minutes then they could not handle this workload either. I am going to look at pricing for big, network-capable instances, which was what I suspected that I might have to do, but thanks for the input 🙂
I know it's a year for the whole pipeline, but I'd need to look at the docs for individual components
anyone have any ideas of how i can debug where memory is being used in a docker container ? i've got a container with a 3GB limit running a clojure process which is using 1.7GB... but the container is getting oom-killed and i've no idea where that extra 1.3GB is going... anyone seen anything similar or have any ideas ?
here's some logging which makes it all pretty clear: https://gist.github.com/mccraigmccraig/df2295da08a9a1e3c21fbe2871780e3c
I think there is a chance that JVM memory can spike and get killed before that usage makes it into logs anywhere
Looks like Java is not obeying your settings for some reason? 788633 pages ~= 3Gb I think
hmm. possible 💡 - i've got the yourkit agent installed on those processes ... i wonder if that's doing something nuts and logging stuff off-heap
and if possible do a heap dump when you get an OOM. (and mount the volume from the docker container on the host, so you don't loose it)
@reborg I didn't know about MaxDirectMemorySize. Does that cope with all the different types of memory a JVM can consume?