Fork me on GitHub
#clojure-uk
<
2020-11-10
>
dharrigan08:11:28

Good Morning!

cdpjenkins09:11:58

…a bore da hefyd! :flag-wales:

maleghast11:11:18

Hello everyone 🙂

mccraigmccraig17:11:03

anyone had a good or bad experience with aws athena ?

Aleksander17:11:49

both: overall it’s a super useful and easy to use service. Occasionally it has latency issues

Aleksander17:11:56

as in: queries stay in starting state and AFAIK there is little you can do with it. Happened to me just once

mccraigmccraig17:11:07

did you convert your data to parquet before dumping to S3 ?

Aleksander17:11:24

for other reasons than performance as well, eg handling of multiline strings

Aleksander17:11:54

if you need it just for performance reasons and csv/json serde works fine for you, there is an option to do the conversion within athena as well

joetague20:11:13

+1 to all the points above.

joetague20:11:30

For our usage it started get expensive as well

dominicm08:11:18

We had a few queries that just broke it (nullpointer exception or something). And then we had to wait for aws support to tell us what was broken so we could stop doing that... But we kinda needed to do that.

mccraigmccraig08:11:31

@U0G2T8PDM was it getting expensive with plain csv or json, or with parquet ?

mccraigmccraig08:11:59

i guess i'll try it out and see... i've got a kafka topic with telemetry data - it looks easy enough to dump that to parquet on s3 with kafka-connect, and if that turns out to lead to criminally expensive queries then i'll dump it to CSV and load into redshift

joetague15:11:08

Just had a peek in the S3 bucket, we were coping GA/BigQuery data from GCS -> S3 as json and left it in Standard-IA class

joetague15:11:36

Guesstimate/ballpark most of the files were about 600-800mb in size, they weren't well partitioned so we ended up having to load in a few GB of data a day