Fork me on GitHub
#off-topic
<
2019-05-27
>
respatialized17:05:41

Is anyone in NY interested in organizing / participating in a study group for SICP? http://sarabander.github.io/sicp/

Eric Ervin18:05:12

Not sure if reposting in #nyc will help. Is this your github? I've done a bit of Nature of Code in Clojure myself. https://github.com/respatialized/nature-of-code

Eccentric J18:05:41

I would be interested in participating. I have a copy of the book but only made it to the first set of exercises.

quadron18:05:36

so I have 50 gigabytes of events in a durable queue (tape library). I want to process these events interactively (if at all possible), do operations such as sort/distinct/filter ... on them. how am i to go about doing this? any pointers appreciated.

kulminaator18:05:30

sort and distinct sound like horrible ideas if you don't filter heavily at first

kulminaator18:05:00

i would probably lift the data to a 50+gb ssd at first and then process it from there as needed

mpenet18:05:03

filter is doable in a streaming fashion, distinct will require some trickery (you could use a bloom filter for instance) if you want to be efficient, but yeah sorting is the killer here

mpenet18:05:16

cool to see a tape user!

kulminaator18:05:28

distinct will work somewhat reasonably if you do them in partitions

mpenet18:05:37

yeah that's also a possibility

kulminaator18:05:51

will mean that you have to read multiple times to fit into memory constraints

mpenet18:05:30

tape as an async interface that you could use for some of it, you can basically consume a tape queue via a core.async chan, so you could apply (a) transducer(s) at that level

1
kulminaator18:05:39

maybe this thing could help you https://github.com/onyx-platform/onyx ?

👍 2
mpenet18:05:52

but the regular interface is quite simple too

mpenet18:05:32

50gb isn't a lot, not sure it's worth going distributed for this.

✔️ 1
kulminaator18:05:46

well if you want to be as lazy as possible

kulminaator18:05:18

and if you have to repeat the task at some point ....

kulminaator18:05:27

for a 1 time effort i would probably do pure clojure though 🙂

kulminaator18:05:52

no extra tooling, perhaps an ssd drive to fit the data multiple times

✔️ 1
emccue20:05:27

Any ideas for giving a user a chronological feed of the posts made by other users they follow?

emccue20:05:43

i have a solution, but the performance is terrible

hmaurer21:05:02

@emccue with what datastore?

emccue21:05:31

Right now mongo

emccue21:05:41

Potentially postgres's can be used

emccue21:05:58

(it's a self contained things)

hmaurer21:05:48

@emccue iirc twitter stores an individual timeline for every user

hmaurer21:05:50

or at least used to

3Jane21:05:49

"Designing Data-Intensive Applications" uses twitter timeline as an example, apparently they had to create a hybrid approach based on the user's popularity (number of followers)

💯 1
3Jane21:05:43

so (as you scale) you may need to have multiple solutions

emccue21:05:49

Those are both fantastic starting points, thank you

👍 1
hmaurer22:05:02

I second what @lady3janepl said; that's a fantastic book

sova-soars-the-sora23:05:31

Hi everyone. Please sign up to try out our new application when it comes out. It's called http://nearhe.re/ You can access it at https://nearhe.re/