Fork me on GitHub

Hello. I'm starting a new project, and I plan to make Onyx a heart of all data processing here. However, historically a lot of things were done in Python here (a lot of natural language processing, specific to a domain), so I can't just throw it out, and if we're going to phase it out someday and reimplement in Clojure, it will be done gradually. As I understand, you have some plans about making Onyx available for other languages, I've taken a look at onyx-ruby PoC, but it's not here yet, and it's not really an urgent priority


Basically, what I need is a foreign function calls, and right now I think I can get away with interacting from Onyx task with my Python code via http api, or something like that


I've taken a look at an approach that pyspark uses, and I think I'll be better of using a simpler thing


That sounds reasonable. I think to do so you will want something like :onyx/batch-fn, which is something I’ve been considering adding. That would let you send a whole batch of segments to your end point


Or allow you to asynchronously make batch-size calls, and return the whole batch when you’re done


Well, it depends on the time that's needed to process one segment, if it takes a couple of seconds, I can just do it one-by-one


That would improve performance a lot. Ideally you’d use something like urania to handle the requests over the batch


Yes, that is very true


It totally depends on what you’re doing


cool, I'll take a look at urania


Is there an example of using the new task bundles approach using Kafka? I can't figure out how you would add a kafka task using task bundles?


gardnervickers: is that code published as a jar in Clojars/Maven Central or will I need to build from that branch?


All our plugins are tested and published against both official releases and our snapshot builds


[org.onyxplatform/onyx-kafka "”] is the current lein coordinate


@agile_geek: For reference, our build matrix links out to every dependent project, and its coordinates are on the top of every README if they exist: