Fork me on GitHub
#aws
<
2021-09-29
>
jjttjj23:09:30

I'm thinking about storing time series data in DynamoDB. This article: https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-time-series.html suggests using a table per day. If I understand correctly, it's due to the fact that in the example they give there is no other partition key besides time, so using multiple tables allows one to adjust the provision capacities separately for less frequently used days in the past, otherwise some of the provisioned capacity will kind of be "wasted" (in a sense). But what if I do have other partition keys that make sense. For example what if I'm recording stock prices for 100 stocks with similar activity. Does this make sense now to use a single table, with the stock ticker as a partition key and time as the sort key?

👀 1
viesti15:09:46

is there actually different amount of data points for each ticker?

jjttjj18:09:12

Roughly. I suppose that could be dealt with by dividing up ones with a lot more points into multiple partitions (ie AAPL#A / AAPL#B. I guess I'm mainly wondering if DDB will remain efficient for range queries for this type of table with ~100 partitions with a large number (millions) of points

viesti13:10:18

I'd be really interested on what you find 🙂

viesti13:10:49

have been thinking this partition by time and by other dimensions (like IoT devices) elsewhere too

viesti13:10:53

I'd think that other databases too have some hard time, if you put all data into same slot

viesti13:10:15

maybe there could be two types of "indexes", one per time, one per ticker, would mean more tables

jjttjj13:10:35

Yeah really all I want, in Clojure data structure terms, is a map of ticker symbol -> sorted maps (by time) like

{"aapl" {1633094934884 {:price 500 :qty 1}}}