Fork me on GitHub
#onyx
<
2015-12-10
>
yusup03:12:35

Hi, quick question , does peer count correlate to some sort of hard limit on jvm thread pool size ?

lucasbradstreet06:12:45

@yusup Yes, however lifecycles and input/output plugins add some variability to it, as some plugins use additional reader / buffer or sender threads

yusup07:12:03

so , I cant spawn more threads within functions than certain threshold which is correlated to peer count ?

yusup07:12:48

I think I should run Yourkit and see it for myself.

lucasbradstreet07:12:50

Oh no there isn't really a hard limit, aside from what the JVM/your app can handle

lucasbradstreet07:12:16

But spawning within functions is almost always a bad idea

lucasbradstreet07:12:57

I highly recommend using Flight Recorder / Mission Control rather than YourKit btw

lucasbradstreet07:12:03

It's built in to Java 8

yusup07:12:08

hmm, thanks

yusup07:12:15

for clearing that up

lucasbradstreet07:12:45

What sort of task is leading you to want to spawn threads from your onyx/fn?

yusup07:12:08

web scraping

yusup07:12:37

I think it is bad idea to put this sort of functionality within onyx/fn.

yusup07:12:10

I am trying to sumbit task to a certain thread pool now.

lucasbradstreet07:12:06

I think there are probably ways to do it successfully but you have to be careful

yusup07:12:18

Most of pages I am dealing with have huge latencies.

lucasbradstreet07:12:13

What are you using to scrape?

yusup07:12:52

http-kit , apache tika

lucasbradstreet07:12:01

Here's a few things I'd keep in mind.

lucasbradstreet07:12:30

If you need to do any expensive upfront, reusable initialisation in your task, do it in a lifecycle.

lucasbradstreet07:12:18

If you're doing a lot of IO, you may want to just increase the number of peers that you use for the IO tasks, rather than spinning up lots of threads within those tasks

lucasbradstreet07:12:18

If the unit of work in a segment is too big and you therefore want to parallelise it in your task, consider splitting it up and sending it to downstream tasks

lucasbradstreet07:12:48

(Just some thoughts. I don't really have any experience with tika and they could be wrong)

yusup07:12:05

Thanks . Those are very helpful.

yusup07:12:10

I am not sure about those ideal segment size ,latency number etc.

lucasbradstreet07:12:29

Yeah it really depends on how much work is being done in the task

lucasbradstreet07:12:19

Be careful with max-pending /pending-timeout on your input tasks. If the amount of work derived from an input segment is large you can get retries. Just something to monitor

yusup07:12:50

Actually , I am getting those retries even after I have set it to 120 secs.

lucasbradstreet07:12:34

What's your input source?

lucasbradstreet07:12:55

K. Max pending is your main backpressure knob there

yusup07:12:17

sorry . I have a meeting to catch atm.