Fork me on GitHub
#off-topic
<
2022-07-20
>
vonadz12:07:01

Anyone experienced with graph algorithms? I have pretty much 0 experience, but have a fairly interesting problem I'm trying to solve. Would appreciate some informed input.

p-himik12:07:26

Well, it's hard to help with a problem if you don't know what the problem is. ;)

1
vonadz12:07:42

Fair enough! For context I recently started http://crosspromote.io/, where creators can post their newsletters. The information they post relevant to this problem is the audience size and the content tags describing what their newsletter is about. It's my responsibility to send out an email to two creators who run newsletters in a similar niche with similar audiences, in order to facilitate an introduction. Matching is fairly simple, since I just match them if their audiences are within 25% of each other, and the both share at least one content tag. With that out of the way, I'm trying to figure out an algorithm for how to best decide on who to send emails to. The main issue is a self-imposed one; I want to limit the number of emails sent out per week per creator. I can start off with a uniform limit for everyone, but the end goal is to have it be chosen by the creators themselves (so some newsletters might want only 2 or 3, others maybe 5 or 8, whatever). Ultimately I'm trying to optimize for emailing the most unique creators. I figured this kind of looks like a graphing problem, where the emails are edges and the nodes are individual newsletters. My initial approach would be to devise a weighting algorithm for each edge, based on the total number of edges the nodes have and how many remaining edges they'll have if that edge is selected (relating back to the limit of emails per week per creator). Then use some version of a longest shortest path algorithm that tries to hit most nodes with the least amount of weights. Having the edge weights be dependent on each other seems a bit complicated though, so I was just going to see if I could ignore that and instead rerun the algorithm x amount of times. Am I on the right path here? I'm sure there's some better solution, since this is just my intuitive approach, and I have a feeling there's A LOT of literature on this kind of problem.

zerjens14:07:48

> I have a feeling there’s A LOT of literature on this kind of problem I love these kinds of problems and there’s no shortage literature around for solving these problems. maybe we should ask a few more questions…

vonadz14:07:51

Fire away, I'm open to answering any.

zerjens14:07:54

> Then use some version of a longest shortest path algorithm that tries to hit most nodes with the least amount of weights do have anything working yet around this? can the data structure fit in memory and is that an assumption you can safely make?

zerjens14:07:56

I’ve played with Loom a bit, it’s fairly good at helping solve problems like this: https://github.com/aysylu/loom

vonadz14:07:12

I just got the nodes + edges loaded into a undirected graph using https://github.com/Engelberg/ubergraph . Yes I think for now we can assume it will all fit in memory without a problem.

zerjens14:07:59

Okay, I was going to say, if this is something you see growing in time, then using something like Neo4J or Spark Graph might be a better option

vonadz14:07:00

I do still need to make the weighting algorithm for edges though before I can run the longest short path algo

vonadz14:07:35

Yeah I'll worry about that when it becomes an issue. For now it would just be overengineering

zerjens14:07:28

yeah, first step is make sure you can solve the actual problem haha, so it sounds like you’re trying to optimize for the number of nodes you can reach based on budget

zerjens14:07:44

so a greedy algo might not be best

vonadz14:07:38

Oh I thought the longest part of the longest shortest algo meant it tried to hit as many nodes as possible, but with the shortest path

zerjens14:07:52

> meant it tried to hit as many nodes as possible, but with the shortest path hmmm, so the path with the highest ratio of nodes / sum-of-weights-traversed

vonadz14:07:51

I guess, I'm not 100% sure though

vonadz14:07:49

Looking at it more, it seems like the path approach might not be appropriate at all

zerjens14:07:15

getting closer haha, for that calculation it seems like it could be O(n!).

zerjens14:07:51

I suppose for this problem, and others like it, start with a few simple base cases, see if working through the solution to this pokes holes in your assumptions about how you’ve constructed the solution, maybe this isn’t a graph problem at all

vonadz14:07:54

Yeah I'm going to hack on it a bit more and look at what algs are available on ubergraph

zerjens14:07:49

cool, if it isn’t super proprietary keep us posted

zerjens14:07:44

also, if this is a problem that scales and needs to “run in production”, I’d again consider looking into Neo4J or Spark Graph (both are amenable to Clojure).

vonadz14:07:24

Thanks for the suggestion. Hopefully it gets to the point where that is a big problem!

vonadz14:07:36

well, not too big

respatialized15:07:52

http://www.frankmcsherry.org/graph/scalability/cost/2015/01/15/COST.html I'd recommend reading through this, as both an overview of some graph algorithms that may be useful (example code in Rust) and as a caution against reaching for Spark when processing this data.

respatialized15:07:45

Your problem sounds in general like it lands in the realm of Collaborative Filtering, which Wikipedia has a good overview of: https://en.m.wikipedia.org/wiki/Collaborative_filtering

👍 1
respatialized15:07:53

You might also try looking at or asking in the #data-science channel for more resources.

vonadz16:07:29

Awesome, thank you! I'll check out those resources.

zerjens17:07:06

ugh, I had no idea there was data science channel facepalm

vonadz17:07:32

Yeah not a fan of the default to hide most of the channels

vonadz19:07:47

Hmm my initial impression of collaborative filtering is that it's meant for matching things together based on large amounts of data points. But that's not really the case here. I only have 2 data points to match on, and it's not going to get any bigger. Instead I need a way to optimize filtering. Maybe my understanding is too shallow though?

zerjens19:07:07

I would agree to be careful w/ Spark fyi - that’s long tangential discussion though, but point well made @U03QQS7341W - have you looked at Algorithms for the Intelligent Web? The Manning series book, I seem to recall there are some examples that are similar outlined in it.

vonadz19:07:12

No I haven't, but I'll take a look. Thanks!

zerjens19:07:45

chapter 3 & 4 covers recommenders and collaborative algos like respatialized mentioned. I read this when the first edition dropped ages ago, but it’s still relevant today.

👍 1
bherrmann14:07:50

FYI: I find this https://www.tabnine.com/ to be very useful. In the sense that I wouldn't want to go back to not using it. Although the majority of my coding is in Java in Intellij, so YMMV. I started using it because the clojuredart demo video was using it.

❤️ 1
isak17:07:04

Excited for multiple cursor support coming to IntelliJ:

😄 9
Dimitar Uzunov17:07:51

Does anyone have a small home office, particularly with a narrow desk? I know there are quite a few "top 10 home office ideas", but wonder how they source desks under 120 cm, they seem to be very rare on the market

mpenet17:07:47

I have a 120cm standing desk with hand crank from Ikea

mpenet17:07:18

I love this thing. Takes no space, limits the clutter and just the right amount of space for me

mpenet17:07:41

That + some ergotron arms (or the Amazon basics knockoffs) so that screen and laptop are not using any space and you're good to go

Dimitar Uzunov17:07:10

120 cm is the smallest I can find, I actually need 90 cm :)

valtteri19:07:28

We have this kind of adjustable standing desk and it’s 78x50cm. Works very nicely with laptop. It’s a Finnish brand “Selka Mahtuva” but I guess something similar is available elsewhere.

valtteri19:07:00

Oh cool, they seem to sell those to other countries as well https://selkastore.eu/

Dimitar Uzunov19:07:32

Ah this is nice! Thanks, I will check them out.

valtteri19:07:58

Not the cheapest ones but quality is excellent.

Dimitar Uzunov19:07:32

Can you attach a monitor arm?

valtteri19:07:48

I don't see why not :thinking_face:

valtteri19:07:17

I don't use external monitors though

D09:07:49

I would recommend http://uberdesk.bg ;-)

D09:07:24

You can order it with plot of any size and shape. Успех!

Dimitar Uzunov07:07:14

Thanks, I will check them out, благодаря :)

👍 1
mpenet17:07:59

You could buy your own top for that Ikea desk

mpenet17:07:16

I can check later, but I think it's doable

vonadz17:07:59

Table saw it

Dimitar Uzunov17:07:26

I actually tried it! The table tops are not solid wood, I could barely attach the legs and they came off again eventually. Will probably work with more solid desks, but I would avoid learning carpentry if possible :)

vonadz17:07:56

Fair enough

vonadz17:07:19

Then you'll have an extra 30 cm desktop ;)

javahippie17:07:41

https://www.flexispot.com Flexispot is very configurable.

vonadz17:07:10

Also I remember checking sometime last year for the IKEA hand crank desk and it wasn't available

javahippie17:07:07

If you can get a top, the frames allow for 100cm (which is still not 90, to be fair)

vonadz17:07:38

Anyone have any experience recovering unsaved files on Arch Linux after a forced restart due to computer freezing? Specifically when using the Remarkable markdown editor?

vonadz18:07:03

Yeah just saw that, now looking through the code to see how easy it would be to add

Michael Nardell18:07:17

I will be traveling to Chicago, IL (USA) for a conference (sadly not on Clojure or programming for that mater) from July 27- Aug 3. I am from a smallish beach town in California, and I don't get to meet too many Clojure programmers - professional or hobbyists (put me in the latter category). If any folks CLS/CLJS want to meet up, I would be really interested. You will not find a Clojure or FP expert. And I am not a fintech person - I work in higher education, fighting a rear-guard battle against the deterioration of the Uni IT dept. ("We used to build stuff, now we just submit trouble tickets to vendors")

Drew Verlee21:07:15

I live in Chicago. I would be happy to meet up for a bit depending on the time and place. I'm curious why your asking in the clojure slack.

Michael Nardell14:07:00

@U0DJ4T5U1 good question - interested in meeting folks working in Clojure; in my community I don't meet many other Clojure programmers. I am mostly interested in Clojure for data munging, math and science applications. I like the pragmatic data centric approach. Clojure reminds me of the things I like working with the J programming lang

Drew Verlee02:07:25

Sure, i would be happy to chat about that. I don't do much in the way of statistical analysis however. Let me know as time draws closer where and when you have in mind and I'll try to meet you.