Fork me on GitHub

how's everybody doing today?


@alex.lynham - Tired, jaded and ill-at-ease with the World and my place in it, since you asked.


(hope that's not too much of a downer πŸ˜‰ )


are you okay dude?


just work getting you down?


Yeah, just work getting me down, and some of it's awesome by the way, I am just more homesick than usual this trip.


I think it's 'cos I've seen friends I've not seen for a long time and they have kids my age but they are home all the time, more than anything if I am honest.


That and I didn't sleep well last night (despite lying to my AirBnB host about it) and I feel a little out of phase with reality this morning.


Nothing a nice relaxing weekend in Scotland and a walk in the hills won't fix, I expect πŸ™‚


(and some time with the family, clearly)


it's gotta be quite tough commuting from scotland to london @maleghast...


It can be, that's for sure. I'll be honest though, @mccraigmccraig, most of the time it feels easier on my system than the daily commute from Sevenoaks used to be. Having to be on trains and Tubes and in the car for upwards of 2 hours each way every day was harder.

πŸ˜‚ 4

I don't miss that at all.


JVM, in my case.


haven't tried. If you're not interested in SSR and just want the fulcro server ("/api") for lambdas. There you can use fulcro server or just do your own thing and use for the parsing the queries.


Thanks. Let me try that out


I could do with having the money to stay a short walk from our London office when I come down, so that it really is only one journey down and one journey home.


That would be ideal


whoa, okay that's an insane commute 😞


Sorry you're feeling low dude


@alex.lynham - Thanks πŸ™‚ In fairness to all concerned, the work-time this trip has been awesome, and I'm not just saying that 'cos some of those people lurk in this channel, but for some reason being away from home has just been harder this week. I am sure it will pass.


Anyone got any experience with using AWS EC2 to ship in large files (15-25 GB) and then push them to S3? I am noticing that doing this on an ad hoc, manual basis I am being hit with network throttling, which I have overcome in the short-term by just killing one instance and launching a new one, but this is not a scalable solution...


Worst case scenario I could__ just acquire the files MUCH more slowly (7-9 hours instead of 15 minutes) after the throttling kicks in (assuming that the throttling is not lifted at some point), but I would rather find out if there is a reliable way to provision infra that can consistently move large files around at speed..?


Also, I need to make sure that I am not (in the fullness of time) paying network traffic fees to move files from EC2s in my Default VPC to S3 - it has been suggested to me that this might be a sleeping cost waiting to come and bite me if I don't take steps to mitigate it.


What sort of instance types are you using?


At the moment, 'cos I am doing it manually, spinning one up for >1hr I am just using t2.micro-s


That could be part of the problem, bigger instances get dedicated network


Yeah, I figured


The thing is, until I have this side of things automated I don't want to pay the hourly for great big, crunchy instances


Could you look at spot instances?


I am nearly done with this brief bit of manual data shipping - the last of three files is going to S3 as I type and is nearly there - but I need to start approaching the problem of automating this whole process and I just want to be certain that the instance I deploy to will not get its network connectivity throttled


when I do it.


i've never noticed any network throttling... we don't generally ship files as big as 25GB but 3-5GB is common


@mccraigmccraig I did one at 20.1 and then another at 19.5 and they both shipped at an average network speed of 30Mb/s but when I kicked off the third transfer I was looking at speeds of 600kb/s instead


(third transfer was another 20.1 Gb file)


i don't think i've ever shipped 60GB of files continuously to S3 though, so i don't think i would have encountered any throttling after 40GB or so


*nods* Yeah, I realise it's not all that common a use-case


.grib files are MAHOOSIVE


some of our structured log dumps and db snapshots are of that sort of size, but they compress quite well


I have not tried compressing .grib files, but they are already a compressed binary format file, so I am not expecting that they will compress all that well. Also I am fetching them after they have been created / compiled on a third-party's system (via an API or by hand) and as such I cannot easily control their initial state. I suppose that I could push them through compression so that they are saved to disk in a compressed format, but I cannot request that they be compressed on that third-party's end.


by ship in you mean upload?


the shuffling/serverless tools for S3 are pretty wild good these days depending on what your vector is...


other than that a small jump box that you can spin up & spin down via whatever your build/config tool is can be a useful way of moving stuff about


but like @firthh said, if you use bigger instances it will be less painful... plus if you optimise for your bottleneck (cpu/ram) you might find it's quicker & so cost is not so different


@alex.lynham - Thanks, that's all helpful πŸ™‚


The problem (although I may be missing something) with serverless and pipelines, is that the third party I am acquiring data from is very strict about the way in which I request data and that they fulfil those requests. I have to make a request, which is queued for an indeterminate period, then it is "run" as a job, which may take hours+, and then finally once the job is complete I am given a URL that lasts for 24 hours (I think). I can't see a way of creating a serverless pipeline to make an HTTP request to that URL once it becomes available.


If I am successful in writing an automation (or series of automations) that make this all hands-free, then at least knowing that if I use a larger instance with dedicated network that I should not have the same throttling issues is kinda enough.


so notify SNS on completion/availability and then it can invoke a lambda


@alex.lynham - Yeah, now you mention it I could do that...


Can I create a Lambda that can take parameters out of the SNS message that invoked it, so that the message could include the URL to the asset and the destination on S3?


That would remove the need for an intermediary Instance that is downloading the asset, writing it to disk and then pushing it on to S3


I guess my concern is that the sheer size of the assets may make that approach costly and / or fragile, but I am more than willing to admit that that is sceptical cynicism coming from a position of ignorance. If I could be persuaded that Lambda could manage all of that without hurting, I would be VERY happy to do it that way.


If you don’t trust Lambda to do the actual processing of the large file, you could still use it as the gateway to trigger the download and upload to S3 on infrastructure you trust

πŸ‘ 4
πŸ’― 4

e.g. lambda to start a large spot instance that will do the download of the asset and upload to S3


@firthh - Yeah, I can see that - the only risk being that the Spot Instance is killed mid-download / upload to S3


But that might be getting overly convoluted at that point


It's not that I don't trust Lambda, I am just not sure what it's actually capable of, and how much doing something like this might actually cost...


Yeah, I think long running lambdas get expensive


I’ve never used in but maybe datapipeline could be useful?


Scrap that. It looks like your data has to start somewhere in AWS for that


yeah long running lambdas are costly and also don't go over 5min


I mean basically can you get the upload point to e.g. sync to S3 directly? so you could have a set of steps - s3 sync - generate url and put in e.g. json file - json file triggers SNS - SNS uses URL


or something ike that


idk πŸ™‚


the quickest fix is just to bump the size of the EC2 instances you're currently using and/or look into relative pricing of spot ones


@firthh - Yeah, that was my understanding on Data Pipeline too, so I had kinda decided it was not the use-case that I wanted.


@alex.lynham - If Lambdas timeout at 5 minutes then they could not handle this workload either. I am going to look at pricing for big, network-capable instances, which was what I suspected that I might have to do, but thanks for the input πŸ™‚


the final thing to consider is AWS step functions


as iirc those have a longer potential lifespan


I know it's a year for the whole pipeline, but I'd need to look at the docs for individual components


anyone have any ideas of how i can debug where memory is being used in a docker container ? i've got a container with a 3GB limit running a clojure process which is using 1.7GB... but the container is getting oom-killed and i've no idea where that extra 1.3GB is going... anyone seen anything similar or have any ideas ?


Are you setting memory options on the JVM?


I think there is a chance that JVM memory can spike and get killed before that usage makes it into logs anywhere


@firthh yeah, i've got -Xms2560m -Xmx2560m -XX:+UseG1GC -server


so i should be seeing java OOMs well before the container gets killed


Yeah, I would have expected something in Clojure to start throwing exceptions


Looks like Java is not obeying your settings for some reason? 788633 pages ~= 3Gb I think


ah - those are pages, not KB ?




that would make sense


hmm. possible πŸ’‘ - i've got the yourkit agent installed on those processes ... i wonder if that's doing something nuts and logging stuff off-heap


try setting -XX:MaxDirectMemorySize and see if it stops OOM


and if possible do a heap dump when you get an OOM. (and mount the volume from the docker container on the host, so you don't loose it)


@reborg I didn't know about MaxDirectMemorySize. Does that cope with all the different types of memory a JVM can consume?


@otfrom off-heap memory gets the same as -Xmx if nothing else is specified. So assuming something is allocating off-heap, setting -XX:MaxDirectMemorySize to a reasonably low value would OOM without crashing the container (my guess)