Fork me on GitHub
#clojure
<
2016-02-25
>
hiredman01:02:35

hugesandwich: it depends on what version of clojure you are using, 1.8 extends reduce and transduce in a lot of ways, and I think reduce may just work on an interator

hiredman01:02:17

1.8 also adds some new java interfaces for things that are reducible

hiredman01:02:26

there is also the clojure.core.protocols/CollReduce protocol which you can extend to your types

hugesandwich01:02:38

the iterator itself works fine and i think i can make it work with reduce/transduce, but I'm more looking for an ideal way to wrap the underlying iterable java object which I am closing over when I reify

hugesandwich01:02:02

so then i can transparently consume the reified object with map, reduce, etc.

hugesandwich01:02:25

really it's just sugar, but the reason I want it is that it's a result set from a java api

hiredman01:02:28

why are you wrapping the java iterable object?

hugesandwich01:02:30

so more natural that way

hugesandwich01:02:41

the object itself implements iterable

hiredman01:02:05

if you don't wrap it all, reduce will work on an Iterable

hugesandwich01:02:28

well if I don't wrap it, then I end up iterating over java objects

hiredman01:02:44

oh, you mean wrapping the contents?

hugesandwich01:02:49

I don't want any consumers of my api to deal with the java objects

hiredman01:02:52

each item in the iteration

hiredman02:02:06

you want an eduction (new in 1.8)

hiredman02:02:38

(eduction (map transform-java-object-to-whatever) some-iterable-thing) should do what you want

hugesandwich02:02:55

yeah I tried something like that...it works fine, my problem is more finding what to implement when reifying so i don't need an extra function

hiredman02:02:21

why are you reifying anything though?

hugesandwich02:02:59

because the Java object has some very specific behavior that some users of my api may need

hugesandwich02:02:05

normally I just map the results directly

hugesandwich02:02:57

but I want to offer something closer to the intended usage because having full access and being decently performant is important, i.e. I can't just pollute the map with extra data the user doesn't want if they are processing huge amounts of data per second

hiredman02:02:23

lets step back

hugesandwich02:02:28

bearing in mind that these results are most likely sent between machines, over the wire, or into a distributed stream processor

hiredman02:02:28

you have an Iterable

hugesandwich02:02:37

yes, that's not the problem

hiredman02:02:29

you want to expose it as some useful thing, and you want each item from the Iterable to replaced with the application of some function to the item

hugesandwich02:02:47

the problem is just the recommended approach for the cleanest way to wrap it. There are lots of variations and they all work. Given it is a result set, it is very likely that iterating over all the data is the most frequent use case

hugesandwich02:02:05

90% of the time someone is going to want to map, reduce, tranduce, whatever so I want to keep that api clean and make it play nice with everything rather than just sticking on some extra functions or adding helper functions

hugesandwich02:02:42

so protocol/interface probably is what i want, but not sure if there is any useful way to do it. Perhaps just an extra function is just fine.

hiredman02:02:55

are you saying you have an Iterable, and you want to support additional behavior that is not iterating over its contents?

hugesandwich02:02:05

don't worry about it, I'll figure something out. I just thought maybe someone has some better ideas than just adding Iterable and returning the iterable or using iterator-seq

hugesandwich02:02:24

thanks for your help

kul02:02:34

I think he wants to do (map (fn[x] ..) some-iterable-object)

kul02:02:29

But x is a transformed object

kul02:02:04

Not sure if its possible unless map is redefined

hiredman02:02:38

like, that is the definition of map

hugesandwich02:02:42

yeah, exposing the raw java iterable is no good because it will return java objects. I instead convert them when the sequence is realized

hugesandwich02:02:56

but it's ugly in a way

hiredman02:02:56

which is what eduction will do

hugesandwich02:02:19

i mean as i said, it works and is buried in yet another function on the reified object

hugesandwich02:02:37

but i'd rather just do something like (into [] my-reified-object)

hiredman02:02:55

eduction will apply a transducer (in this case the map transducer) to the reducing function when the Eduction is reduced

hugesandwich02:02:02

but that's not the only use case because it returns 3 different types of results

hugesandwich02:02:19

and iterating like that would just be over everything, which is fine for that case, but i still need to support the others

hugesandwich02:02:10

normally I wouldn't obsess, but it's a key part of my app and probably something I'll open source, not to mention it needs to be pretty fast

hiredman02:02:03

if you want it to be fast, don't wrap it

hugesandwich02:02:20

well I have to for various reasons

hugesandwich02:02:31

there's a faster version that just maps exactly the results someone wants

hugesandwich02:02:53

but sometimes with kafka you have all kinds of things you need like access to a bunch of the information it returns

hugesandwich02:02:06

and the api is such that if you realize the sequence in the wrong way, you will lose results

hugesandwich02:02:15

you can't just keep calling back into it

kul02:02:20

Did you had a look at how java.jdbc exposes rsultset

hugesandwich02:02:41

nope, but that's a good thought

hugesandwich02:02:16

a bit different but maybe something usable based on how some of the things in clj use it

kul02:02:25

Its very simple, it just asks user for a result-set-fn and a row-fn

kul02:02:42

In your case you can chose to make rowfn implicit

kul02:02:06

Also it is the only correct way to deal with a large resultset as your function only know when to close things

hugesandwich02:02:47

maybe i'm misunderstanding you, but to be clear it is a conceptual result set, not an actual resultset object

hugesandwich02:02:05

so maybe bad word choice on my part

kul02:02:22

You mean a lazy unrealized result set?

hugesandwich02:02:54

not the easiest to understand though if you don't have much knowledge of Kafka

kul02:02:49

It is iterable what else do you care about?

hugesandwich02:02:46

you get back this object and it has a few properties that a client may or may not need

hugesandwich02:02:13

beyond that, there are a couple of methods you can call, but they influence what is returned and you cannot mix/call them again after you do

hugesandwich02:02:35

so one returns all the results for a topic, another for a topic + partition, and another returns all the results. The iterable itself does the same thing as returning all the results.

hugesandwich02:02:09

For efficiency or depending on your use case, you often don't want all the results, however commonly people do. My interest is supporting all this functionality and making it simple in clojure land

hugesandwich02:02:43

it all works as i have it + I have a more efficient way of doing it that supports less flexible but more performance oriented use cases

kul02:02:01

I dont see how the same abstraction cant be used here

hugesandwich02:02:24

so I'm just trying to find a way to wrap that iterable with all the results to make the reified object act like the source iterable, but efficiently returning clojure maps instead of java objects

kul02:02:42

Have a look at java.jdbc first and see if it makes sense

hugesandwich02:02:03

alright, will do when i get a minute. about to call it a day simple_smile Thanks for the input.

hugesandwich02:02:21

@jonahbenton: yup, but there's not much of a need to do this exactly. Instead I had some code that just returned the object i closed over which itself is already iterable

hugesandwich02:02:48

that makes it works, but doesn't solve the issue i mentioned with not wanting java objects exposed to clients

hugesandwich02:02:12

still would need to map it, which is why we were talking about transduce/educe

hugesandwich02:02:21

so the goal was to do that, but more transparently

jonahbenton02:02:19

right- so when returning a lazy sequence, cons a map rendering of the object?

hugesandwich02:02:24

arguably though you can do a bit in the next operations

hugesandwich02:02:42

pretty much....turn the java objects into a map

hugesandwich02:02:17

I've messed around with all kinds of versions of that, including using transients......I think though I should just leave it as I have it and then if someone hates it, they can yell at me later

jonahbenton02:02:43

yeah, the lazy sequence solution seems nice because it's a tiny amount of code and it doesn't unnecessarily touch operations like hasNext

hugesandwich02:02:54

yeah it is easy to break the transparency

hugesandwich02:02:21

they discuss it some in the link you sent actually

hugesandwich02:02:42

and actually in past versions of what i am working with, they royally screwed up the iterator

jonahbenton02:02:15

right- that seems like the reason to deliver a narrower interface than Iterator. if all you need to provide is a rendering of a stream of java objects as a stream of data, you actually don't want consumers to have, e.g. remove

seancorfield05:02:44

Hi @b702730 ! Welcome to #clojure !

seancorfield05:02:00

Is that a reference to Beijing Jeep model?

mpenet07:02:31

hugesandwich: it's a fairly common thing to do you can implement IReduce over the iterator and allow the user to keep control of how he wants to consume the data, squee and alia among other libs do that over result-sets

mpenet07:02:06

alia for instance allows to consume the result-set as a seq or a reducible if you pass a result-set-fn

mpenet07:02:35

so can be lazy or not, seq-less (lightweight, faster) or not

kul07:02:45

wow so it does not close the resultset!

mpenet07:02:03

you can get super cheap application of transducer to resultset/iterator for instance

mpenet07:02:35

(alia/execute session query {:result-set-fn #(into [] xform %)})

kul07:02:39

I think everyone agrees on this abstraction over resultsets

kul07:02:47

:row-fn and :result-set-fn

mpenet07:02:27

the magic is more about IReduceInit here, result-set-fn can be done on top of it, but it's not the core of it

mpenet07:02:48

no lock-in on how it's done is good

kul07:02:53

is it really required to extend IReduceInit?

kul07:02:20

in java.jbdc the author simply creates a lazy seq with .next and .isEmpty

mpenet07:02:25

it allows to give control of what you get

kul07:02:45

so does java.jdbc

kul07:02:01

it actually gives you a map insteal of a result object

mpenet07:02:03

if you just take the result-set as a seq it's lazy (non chunked) in alia, if you pass #(into ...) it's eager and super cheap to realize (no intermediate seq)

mpenet07:02:20

it's all about making it open and customizable

kul07:02:29

how is it different from :result-set-fn doall

mpenet07:02:36

no intermediate seq

kul07:02:31

interesting thought, thinking how many nanocesconds will that shave?

mpenet07:02:37

you just tell how to reduce over the iterator, rest is free for the user

mpenet07:02:50

it's more about creating less garbage

kul07:02:35

i will walk out of this discussion as i am not really sure about the optimization, you must be knowing better

mpenet07:02:47

clojure itself is full of this kind of optimizations

kul07:02:00

hurmm indeed

kul07:02:16

i see abundant use of transient !persistent in many places

nha13:02:57

Hello, wondering what are your opinions on this, is it a Clojure(Script) bug ? http://stackoverflow.com/q/35626096/1327651

nkraft13:02:08

This discussion of Alia and increasing read performance makes me wonder: what is the best approach to increasing write (insert) performance to Cassandra?

nkraft13:02:19

With Alia, that is...

deas13:02:47

"lein deps :tree" shows a dependency of scope "provided". How come it appears in the uberjar, and is there a way to force exclusion?

mpenet14:02:27

@nkraft: use execute-async

mpenet14:02:00

also avoid raw statements, use prepared statements

mpenet14:02:06

it's way faster

hugesandwich14:02:58

@mpenet - thanks, this is what I was looking for I think. I checked some clients to some other dbs and wasn't finding it, but this is pretty much it. I couldn't remember the name of clojure.lang.IReduceInit when I was so worn out yesterday. Thanks again!

mpenet14:02:00

@nkraft: if it's still too slow for you you might want to look into sstableloader (out of alia's scope)

hugesandwich14:02:01

might do so, for now just trying to get some code working and relatively usable

mpenet14:02:27

hugesandwich: both you and nkraft work on the same thing?

hugesandwich14:02:09

@mpenet not sure what nkraft is working on simple_smile I'm putting together a slightly different kafka client for clojure. It started as just something custom to better suit the use-cases of my own project, but I have 2 other projects interested in probably tearing it up some and adding/fixing what I do.

mpenet14:02:13

@nkraft: you can also try batching, but I doubt it'll get you better timings. worth trying tho

hugesandwich14:02:25

I had no intention in writing a new client at first, but I wasn't really able to use the existing clients to great success

hugesandwich14:02:56

also I need the admin libraries which are mostly in Scala and a huge mess, which none use

hugesandwich14:02:14

yup, I'm sorry to say it's not for me

hugesandwich14:02:23

I did take some inspiration from it though

hugesandwich14:02:41

but it does some things that are actually somewhat wrong, but as the author says, it is "opinionated"

hugesandwich14:02:50

so i am sure it works fine for him

mpenet14:02:59

I don't use kafka myself so I couldn't say

hugesandwich14:02:15

that is what made me decide to take my own approach actually....but now that I have some people willing to perhaps help a bit, it should be alright

hugesandwich14:02:24

well the kafka api is a bit of a mess historically

hugesandwich14:02:35

and in 0.9 (latest), they changed the consumer entirely

hugesandwich14:02:49

the kinsky client doesn't implement a good chunk of the consumer and none of the admin

hugesandwich14:02:09

and all the existing clients throw around java types or do crazy things

mpenet14:02:41

well from my little experience, the less you wrap the better

hugesandwich14:02:43

actually in the example I gave of ConsumerRecords, the kinsky library is super inefficient because he's doing a lot of extra fetching of results and transformations that are unneeded

hugesandwich14:02:08

so that itself is a bit of a performance hit

mpenet14:02:08

I mean, it often turns into a mess to adds tons of high level abstractions over java client stuff

mpenet14:02:30

(not to mention perf)

hugesandwich14:02:33

oh I agree, and while I certainly have tons of experience with various languages including Java and Scala, less so with Clojure and even less so with interop

hugesandwich14:02:39

so I really didn't want to do this

hugesandwich14:02:57

but on the other hand it turns out it's pretty much the same as any language as I figured.....I'm getting old I guess

hugesandwich14:02:20

also I need to send results over the wire and use them with stream processors....

hugesandwich14:02:29

so turning things into maps rather than passing java types is a lot better

hugesandwich14:02:44

coordinating with Onyx for example

mpenet14:02:03

that's what I imagined

hugesandwich14:02:17

also the Java API is full of weird design where you can do x, y, or z, but only x then z, not x then y then z

hugesandwich14:02:27

literally if you call things in the wrong order, it won't work

hugesandwich14:02:34

and plenty of calls are broken by design

hugesandwich14:02:50

that said, not trying to fix most of those things, just provide something close to java but speaks clojure for now

hugesandwich14:02:15

one of my favorites is they have a Java class called TopicPartition and a Scala class called TopicAndPartition

hugesandwich14:02:37

and yes, they are for exactly the same thing

mpenet14:02:22

fortunately I never had to deal with scala interop

hugesandwich14:02:36

btw I need a cassandra client too, so this works out well for me simple_smile I had some old one in my old code base from last year I have to migrate it after dealing with kafka, so thank you

mpenet14:02:21

you're welcome, feel free to ping me if you have pb/questions

hugesandwich14:02:09

will do, thanks

pyr15:02:46

@hugesandwich: i'm willing to decouple the opiniated bits from the plain bits if that helps

pyr15:02:03

@hugesandwich: it would be great to not end up with a clutter of kafka clients

tolitius15:02:11

@pyr: kinsky looks great! is there a reason channels have hardcoded buffer size i.e 10: https://github.com/pyr/kinsky/blob/master/src/kinsky/async.clj#L104 ?

tolitius15:02:37

you just find is a "sensible" buffer size? simple_smile

pyr16:02:45

yeah, and factored out of internal code which could use some configurability

pyr16:02:53

which I'm totally open to simple_smile

nkraft16:02:25

@mpenet execute-async made an incredible difference. From a processing time of 4 minutes for 10,000 entities to a time of 32 seconds. That's impressive.

mpenet16:02:15

32s for 10k entries sounds not great tbh, you should be able to hit c* a lot faster

mpenet16:02:44

you are using prepared statements?

nkraft16:02:33

@mpenet I'm working on it. The Cassandra server we have isn't really a speed demon but I know I can do better than 32s.

nkraft16:02:28

@mpenet Question: my queries are via hayt. Prepared statements and hayt? or should I switch to CQL and build them that way?

mpenet16:02:25

if you prepare the query it doesn't matter if it's from hayt or a string, but you need to reuse the prepared query instance for all execute calls

hugesandwich16:02:38

@kinsky yeah for one, I didn't like the hard buffer size, would be great to decouple

hugesandwich16:02:23

@kinsky I agree, I really didn't want to write a client but like I said, I had some specific needs ranging from admin to some producer/consumer behaviors. It'll be an ongoing thing as I do more real-world usage. I originally wanted to just submit a pull request to you to be honest, but after I started working more I decided to go another direction for now. The other reason is I'm trying to collaborate with the onyx guys a bit on some Kafka-related concerns and they offered some help as well.

hugesandwich16:02:00

@kinskyI also investigated clj-kafka 0.9 branch and a pull request there, but neither suited my needs and again were not entirely done

mpenet16:02:01

I think you mean to write @pyr instead of @kinsky 😆

hugesandwich16:02:12

haha yes I think

hugesandwich16:02:20

I've been sleeping little lately

hugesandwich16:02:44

@pyr see my stupidity above

pyr17:02:34

@hugesandwich: if you're willing to help with kinsky down the line, that'd be greatly appreciated, it would benefit a lot from outside help

hugesandwich17:02:06

@pyr definitely keep it in mind. For now, my priority is getting something done that works for me and can help out with a specific onyx plugin I need to work with the 0.9 client. Don't have tons of time to put into it right now, but I'll try to at least share what I have if I can.

ericfode19:02:11

Does anyone have an opinion on what dynamo db interface to use in clojure? So far I have found faraday and hildebrand. hildebrand, I’m not sure if it’s still maintained faraday seems to be less extensive… any experinces…??

sdegutis19:02:44

@jonas: I've just submitted a patch to make data.csv slightly more convenience. Let me know if there's anything further I can do to assist in get that patch accepted or altered.

bbrinck22:02:55

For those that use prismatic schema, do you tend to put collect schemas in a common namespace, or spread them out (so that schemas live in the namespace with the relevant functions), or something else?

roberto22:02:16

I tend to put them in a models namespace

bbrinck22:02:23

@roberto: Ok, good to know I'm not alone in collecting them together

arrdem22:02:34

I'd personally split 'em up, one namespace each with the model and the relevant fns

roberto22:02:32

my apps aren’t so large that I have that many namespaces anyway