This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-02-25
Channels
- # aatree (10)
- # beginners (59)
- # boot (314)
- # braveandtrue (4)
- # cider (50)
- # cljs-dev (12)
- # cljsrn (6)
- # clojure (206)
- # clojure-austin (2)
- # clojure-gamedev (90)
- # clojure-japan (1)
- # clojure-poland (12)
- # clojure-russia (10)
- # clojure-sg (1)
- # clojurescript (86)
- # core-async (2)
- # core-matrix (3)
- # cursive (40)
- # datomic (2)
- # dirac (13)
- # editors (25)
- # emacs (7)
- # hoplon (2)
- # immutant (10)
- # jobs (24)
- # jobs-discuss (1)
- # ldnclj (8)
- # lein-figwheel (19)
- # leiningen (1)
- # mount (7)
- # off-topic (34)
- # om (147)
- # onyx (11)
- # parinfer (151)
- # pedestal (2)
- # re-frame (31)
- # reagent (13)
- # ring-swagger (7)
- # spacemacs (1)
- # yada (11)
hugesandwich: it depends on what version of clojure you are using, 1.8 extends reduce and transduce in a lot of ways, and I think reduce may just work on an interator
there is also the clojure.core.protocols/CollReduce protocol which you can extend to your types
the iterator itself works fine and i think i can make it work with reduce/transduce, but I'm more looking for an ideal way to wrap the underlying iterable java object which I am closing over when I reify
so then i can transparently consume the reified object with map, reduce, etc.
really it's just sugar, but the reason I want it is that it's a result set from a java api
so more natural that way
the object itself implements iterable
well if I don't wrap it, then I end up iterating over java objects
I don't want any consumers of my api to deal with the java objects
(eduction (map transform-java-object-to-whatever) some-iterable-thing) should do what you want
yeah I tried something like that...it works fine, my problem is more finding what to implement when reifying so i don't need an extra function
https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html - this is what i am wrapping
because the Java object has some very specific behavior that some users of my api may need
normally I just map the results directly
but I want to offer something closer to the intended usage because having full access and being decently performant is important, i.e. I can't just pollute the map with extra data the user doesn't want if they are processing huge amounts of data per second
bearing in mind that these results are most likely sent between machines, over the wire, or into a distributed stream processor
yes, that's not the problem
you want to expose it as some useful thing, and you want each item from the Iterable to replaced with the application of some function to the item
the problem is just the recommended approach for the cleanest way to wrap it. There are lots of variations and they all work. Given it is a result set, it is very likely that iterating over all the data is the most frequent use case
90% of the time someone is going to want to map, reduce, tranduce, whatever so I want to keep that api clean and make it play nice with everything rather than just sticking on some extra functions or adding helper functions
so protocol/interface probably is what i want, but not sure if there is any useful way to do it. Perhaps just an extra function is just fine.
are you saying you have an Iterable, and you want to support additional behavior that is not iterating over its contents?
don't worry about it, I'll figure something out. I just thought maybe someone has some better ideas than just adding Iterable and returning the iterable or using iterator-seq
thanks for your help
also sorry, I meant this, pasted the wrong link - https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html
yeah, exposing the raw java iterable is no good because it will return java objects. I instead convert them when the sequence is realized
but it's ugly in a way
i mean as i said, it works and is buried in yet another function on the reified object
but i'd rather just do something like (into [] my-reified-object)
eduction will apply a transducer (in this case the map transducer) to the reducing function when the Eduction is reduced
but that's not the only use case because it returns 3 different types of results
and iterating like that would just be over everything, which is fine for that case, but i still need to support the others
normally I wouldn't obsess, but it's a key part of my app and probably something I'll open source, not to mention it needs to be pretty fast
well I have to for various reasons
there's a faster version that just maps exactly the results someone wants
but sometimes with kafka you have all kinds of things you need like access to a bunch of the information it returns
and the api is such that if you realize the sequence in the wrong way, you will lose results
you can't just keep calling back into it
nope, but that's a good thought
a bit different but maybe something usable based on how some of the things in clj use it
Also it is the only correct way to deal with a large resultset as your function only know when to close things
maybe i'm misunderstanding you, but to be clear it is a conceptual result set, not an actual resultset object
so maybe bad word choice on my part
https://kafka.apache.org/090/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html - this is it
not the easiest to understand though if you don't have much knowledge of Kafka
you get back this object and it has a few properties that a client may or may not need
beyond that, there are a couple of methods you can call, but they influence what is returned and you cannot mix/call them again after you do
so one returns all the results for a topic, another for a topic + partition, and another returns all the results. The iterable itself does the same thing as returning all the results.
For efficiency or depending on your use case, you often don't want all the results, however commonly people do. My interest is supporting all this functionality and making it simple in clojure land
it all works as i have it + I have a more efficient way of doing it that supports less flexible but more performance oriented use cases
so I'm just trying to find a way to wrap that iterable with all the results to make the reified object act like the source iterable, but efficiently returning clojure maps instead of java objects
alright, will do when i get a minute. about to call it a day Thanks for the input.
@hugesandwich: did you see the third solution on http://stackoverflow.com/questions/9225948/how-do-turn-a-java-iterator-like-object-into-a-clojure-sequence ?
@jonahbenton: yup, but there's not much of a need to do this exactly. Instead I had some code that just returned the object i closed over which itself is already iterable
that makes it works, but doesn't solve the issue i mentioned with not wanting java objects exposed to clients
still would need to map it, which is why we were talking about transduce/educe
so the goal was to do that, but more transparently
right- so when returning a lazy sequence, cons a map rendering of the object?
arguably though you can do a bit in the next operations
pretty much....turn the java objects into a map
I've messed around with all kinds of versions of that, including using transients......I think though I should just leave it as I have it and then if someone hates it, they can yell at me later
yeah, the lazy sequence solution seems nice because it's a tiny amount of code and it doesn't unnecessarily touch operations like hasNext
yeah it is easy to break the transparency
how so?
they discuss it some in the link you sent actually
and actually in past versions of what i am working with, they royally screwed up the iterator
right- that seems like the reason to deliver a narrower interface than Iterator. if all you need to provide is a rendering of a stream of java objects as a stream of data, you actually don't want consumers to have, e.g. remove
Hi @b702730 ! Welcome to #clojure !
Is that a reference to Beijing Jeep model?
hugesandwich: it's a fairly common thing to do you can implement IReduce over the iterator and allow the user to keep control of how he wants to consume the data, squee and alia among other libs do that over result-sets
https://github.com/mpenet/alia/blob/master/modules/alia/src/qbits/alia/codec.clj#L118-L139 https://github.com/ghadishayban/squee/blob/master/src/squee/impl/resultset.clj#L49-L82
alia for instance allows to consume the result-set as a seq or a reducible if you pass a result-set-fn
the magic is more about IReduceInit here, result-set-fn can be done on top of it, but it's not the core of it
if you just take the result-set as a seq it's lazy (non chunked) in alia, if you pass #(into ...) it's eager and super cheap to realize (no intermediate seq)
i will walk out of this discussion as i am not really sure about the optimization, you must be knowing better
Hello, wondering what are your opinions on this, is it a Clojure(Script) bug ? http://stackoverflow.com/q/35626096/1327651
This discussion of Alia and increasing read performance makes me wonder: what is the best approach to increasing write (insert) performance to Cassandra?
"lein deps :tree" shows a dependency of scope "provided". How come it appears in the uberjar, and is there a way to force exclusion?
@mpenet - thanks, this is what I was looking for I think. I checked some clients to some other dbs and wasn't finding it, but this is pretty much it. I couldn't remember the name of clojure.lang.IReduceInit when I was so worn out yesterday. Thanks again!
@nkraft: if it's still too slow for you you might want to look into sstableloader (out of alia's scope)
might do so, for now just trying to get some code working and relatively usable
@mpenet not sure what nkraft is working on I'm putting together a slightly different kafka client for clojure. It started as just something custom to better suit the use-cases of my own project, but I have 2 other projects interested in probably tearing it up some and adding/fixing what I do.
@nkraft: you can also try batching, but I doubt it'll get you better timings. worth trying tho
I had no intention in writing a new client at first, but I wasn't really able to use the existing clients to great success
did you check https://github.com/pyr/kinsky ?
also I need the admin libraries which are mostly in Scala and a huge mess, which none use
yup, I'm sorry to say it's not for me
I did take some inspiration from it though
but it does some things that are actually somewhat wrong, but as the author says, it is "opinionated"
so i am sure it works fine for him
that is what made me decide to take my own approach actually....but now that I have some people willing to perhaps help a bit, it should be alright
well the kafka api is a bit of a mess historically
and in 0.9 (latest), they changed the consumer entirely
the kinsky client doesn't implement a good chunk of the consumer and none of the admin
and all the existing clients throw around java types or do crazy things
actually in the example I gave of ConsumerRecords, the kinsky library is super inefficient because he's doing a lot of extra fetching of results and transformations that are unneeded
so that itself is a bit of a performance hit
I mean, it often turns into a mess to adds tons of high level abstractions over java client stuff
oh I agree, and while I certainly have tons of experience with various languages including Java and Scala, less so with Clojure and even less so with interop
so I really didn't want to do this
but on the other hand it turns out it's pretty much the same as any language as I figured.....I'm getting old I guess
also I need to send results over the wire and use them with stream processors....
so turning things into maps rather than passing java types is a lot better
coordinating with Onyx for example
also the Java API is full of weird design where you can do x, y, or z, but only x then z, not x then y then z
literally if you call things in the wrong order, it won't work
and plenty of calls are broken by design
that said, not trying to fix most of those things, just provide something close to java but speaks clojure for now
one of my favorites is they have a Java class called TopicPartition and a Scala class called TopicAndPartition
and yes, they are for exactly the same thing
btw I need a cassandra client too, so this works out well for me I had some old one in my old code base from last year I have to migrate it after dealing with kafka, so thank you
will do, thanks
@hugesandwich: i'm willing to decouple the opiniated bits from the plain bits if that helps
@hugesandwich: it would be great to not end up with a clutter of kafka clients
@pyr: kinsky
looks great! is there a reason channels have hardcoded buffer size i.e 10
: https://github.com/pyr/kinsky/blob/master/src/kinsky/async.clj#L104 ?
@mpenet execute-async made an incredible difference. From a processing time of 4 minutes for 10,000 entities to a time of 32 seconds. That's impressive.
@mpenet I'm working on it. The Cassandra server we have isn't really a speed demon but I know I can do better than 32s.
@mpenet Question: my queries are via hayt. Prepared statements and hayt? or should I switch to CQL and build them that way?
if you prepare the query it doesn't matter if it's from hayt or a string, but you need to reuse the prepared query instance for all execute calls
@kinsky yeah for one, I didn't like the hard buffer size, would be great to decouple
@kinsky I agree, I really didn't want to write a client but like I said, I had some specific needs ranging from admin to some producer/consumer behaviors. It'll be an ongoing thing as I do more real-world usage. I originally wanted to just submit a pull request to you to be honest, but after I started working more I decided to go another direction for now. The other reason is I'm trying to collaborate with the onyx guys a bit on some Kafka-related concerns and they offered some help as well.
@kinskyI also investigated clj-kafka 0.9 branch and a pull request there, but neither suited my needs and again were not entirely done
haha yes I think
I've been sleeping little lately
@pyr see my stupidity above
@hugesandwich: if you're willing to help with kinsky down the line, that'd be greatly appreciated, it would benefit a lot from outside help
@pyr definitely keep it in mind. For now, my priority is getting something done that works for me and can help out with a specific onyx plugin I need to work with the 0.9 client. Don't have tons of time to put into it right now, but I'll try to at least share what I have if I can.
Does anyone have an opinion on what dynamo db interface to use in clojure? So far I have found faraday and hildebrand. hildebrand, I’m not sure if it’s still maintained faraday seems to be less extensive… any experinces…??
@jonas: I've just submitted a patch to make data.csv slightly more convenience. Let me know if there's anything further I can do to assist in get that patch accepted or altered.