Fork me on GitHub
#clojure
<
2016-02-27
>
josh.freckleton00:02:30

Is Monger the best way of accessing MongoDBs, or is there something more highly recommended? (I ask because it looks like the build has been failing a lot, and I can't tell, but maybe that's just for Java 7?)

arrdem00:02:37

I've used congomongo successfully in the past. No idea of the relative strength of Monger.

arrdem00:02:34

It looks like Monger's test failures are due to a JVM or platform bug.

josh.freckleton01:02:02

@arrdem: oh awesome, that hadn't even stood out to me, thanks! Is there anything you particularly liked about congomongo?

jtackett01:02:18

Anyone know how to trigger a file download in clojure? @anthgur

jtackett01:02:12

nvm just realized all I need to do is write the file….

jtackett01:02:18

wow I’ve been coding hard for too long haha

jonahbenton01:02:54

@hugesandwich: you mean your client code has to do all that work? a better pattern is to create a data-oriented api and hide the ugliness from clients so that clients just have to provide a data structure, and just get back a data structure. pedestal https://github.com/pedestal/pedestal does this well, in hiding the ugliness and types in the servlet api from client code like interceptors. the amazon aws api wrapper https://github.com/mcohen01/amazonica is also a very interesting example- it actually produces a data-oriented api based on metadata.

hugesandwich01:02:33

@jonahbenton: When I say client, I mean my code itself is a client, not the user as a client. The user only sees clojure types to/from. I'm doing the work of hiding that, just trying to make it easier.

jonahbenton01:02:32

gotcha. have you looked at amazonica? pretty interesting how it solves that problem

hugesandwich01:02:47

where specifically? I just looked, but there's a bunch of namespaces to dig through

hugesandwich01:02:48

I see a few coerce, coerce-value calls

hugesandwich01:02:29

anyway, if you already know something specific, let me know. I've got to head out for a bit, but thanks for the examples. Always the best thing if an approach is already proven.

jonahbenton02:02:23

the short answer is that the amazon api is pretty systematic, so it uses reflection to produce a data oriented clojure wrapper

hugesandwich02:02:27

Looks interesting. thanks! I thought some about doing a reflective approach up-front to avoid any per-call costs. In my case, I am trying to give people enough clojure sugar and data structures, but speed is a pretty big priority as this code will be processing huge amounts of data per second. I'm assuming this happens up front from what you describe, so no per-call costs.

jonahbenton02:02:05

well, some of it- it interns the generated symbols into the namespace, but in those functions there is dispatch logic executed at call time to determine the right native method. that in addition to data/type encode/decode logic. in the amazonica case, of course, all calls are going over the network so optimizing the in-jvm work isn't important

hugesandwich02:02:45

ok, in my case I'm working with streaming and stream related processing

hugesandwich02:02:19

so things get called a ton and need to be relatively fast, so I might look to another approach, but use the amazonica approach in another piece of code for a different piece of my project

jonahbenton02:02:43

sure, in that case even having to do data->type conversions on every call can be really detrimental. all the extra allocations, etc.

hugesandwich02:02:35

for sure, it's just a compromise as I mentioned and still seems to be pretty fast

hugesandwich02:02:26

I use transients and transducers a bit where possible for some gains, but trying to figure out some other ways to make it clean while still squeezing some extra juice

jonahbenton02:02:53

so the stream processing is happening in clojure, consuming java/scala types, or the reverse feeding clojure data into java/scala processing?

hugesandwich02:02:55

stream processing is happening in clojure, but could be done in java/scala and handed to clojure on the edges

hugesandwich02:02:16

there's also some reverse where i get things back over the wire and going back from java/scala to clojure

hugesandwich02:02:01

really it's just the last steps happen in clojure for stream processing, and receiving data, it's going out again to whatever, probably clojure

hugesandwich02:02:10

I'm not really interested in java/scala users using my code

hugesandwich02:02:34

but no reason you can't roll everything to a point and hand off to my code nonetheless

hugesandwich02:02:40

it's all distributed systems anyway

arrdem05:02:01

@josh.freckleton: No. congomongo is tolerable, it falls into the one connection/`with-connection` anti-pattern which isn't awful. Its docs were decent, it looks like the monger docs are stale and the maintainers haven't taken the time to fix them from a quick look at the issues list.

arrdem05:02:44

@josh.freckleton: obligatory "rethninkdb > mongodb for pretty much everything" comment and plug for the excellent clj-rethinkdb driver by @danielcompton

shanekilkelly15:02:27

does anyone else find the clojure.jdbc stuff weirdly painful to work with? I’ve been getting really frustrated with the amount of boilerplate java-munging code needed to get it to behave sanely, and starting to think I may just be holding it wrong.

donaldball15:02:57

Yeah, I sometimes have issues with it as well. Mostly regarding the extremely flexible args and auto-transaction behavior, but it’s also oddly difficult to execute an insert statement that lets you get at the generated keys if you can’t use the built-in insert helper for whatever reason.

donaldball16:02:51

Once you learn its quirks though, it gets the job done well enough. Certainly I’ve never taken the time to sit down and puzzle through a simpler design.

shanekilkelly16:02:29

the thing I’m having most trouble with is all the java-munging boilerplate code that needs to be copy-pasted into the project in order to have sane handling of (for example) postgres. Slightly ranty example: https://twitter.com/shanekilkelly/status/703299796476563457

shanekilkelly16:02:57

should this stuff not be in a library that can Do The Right Thing™ by default?

donaldball16:02:02

(If I did, I would enforce every series of statements to specify its transactional requirements. I have come to mislike anything that obscures transaction handling.)

shanekilkelly16:02:04

and that’s where I sit back and think, “hold on, maybe I’ve got this wrong”. so if anyone here is aware of a better way of doing this please speak up simple_smile

donaldball16:02:51

Eh, that stuff is postgres rich datatype specialness, right? I could go either way on having that in c.j.jdbc

donaldball16:02:22

But it should/could be wrapped up in some kind of lib

shanekilkelly16:02:34

yeah, i get that it’s specific to postgres, but in almost any other language I can name you don’t need to jump through any special hoops to get this stuff to work.

shanekilkelly16:02:58

I’ve started looking at putting it in a library today actually. the one thing I’m not sure about: if my lib just doesn’t specify a version of jdbc or of the postgres adapter, and a dev imports my lib into their project, will my lib pick up on the version of jdbc/postgres they’ve specified in the main project? I kinda don’t want to proscribe which version of jdbc/postgres should be used

shanekilkelly16:02:11

I mean, if we went up to the Python guys and said “you need about 100 lines of boilerplate code in order to talk to postgres” they’d laugh us out of the hall. simple_smile

shanekilkelly16:02:51

ok, i think I’ve got something that will work: https://github.com/ShaneKilkelly/clj-jdbc-pg-sanity

noisesmith16:02:07

shanekilkelly: lein or boot or maven even will let a dev override your versions and use the version they ask for instead

shanekilkelly16:02:41

i think that’s what I want simple_smile this lib would be useless if they weren’t already using clojure.jdbc/postgres, so i think it’ll work

noisesmith16:02:43

shanekilkelly: I think the reasonable thing is to make your stuff work with the latest stable version and let people try switching versions if they want

shanekilkelly16:02:11

yeah, was thinking the same. will use the latest version in the test profile, but not declare a dependency otherwise

noisesmith16:02:11

nice repo image, btw

noisesmith16:02:33

shanekilkelly: no I think it's best to actually declare dependencies on things you use

noisesmith16:02:54

if a dev specifies another version, they can use it, but the only reason to leave yours out entirely is enterprisey container bullshit

shanekilkelly16:02:00

oh, ok. so should I just put the latest version in my dependencies vector?

noisesmith16:02:06

which shouldn't be affecting sql stuff

shanekilkelly16:02:33

i was worried that a dev may be on an older version or something, and not wanting to force an upgrade

noisesmith16:02:45

like, servlet related classes, yeah don't provide thos - the container will do it, but that doesn't apply here

shanekilkelly16:02:11

so, if I’ve got this right, lein/maven will prefer the ‘application’ level version over the ‘library’ level version?

noisesmith16:02:18

shanekilkelly: yeah, lein and boot are smart about letting a dev specify an explicit version, that shouldn't be your problem

shanekilkelly16:02:42

I’ve updated project.clj

noisesmith16:02:35

thanks for taking care of some of that boilerplate, I might end up wanting to use that lib soon

shanekilkelly16:02:03

if you wouldn’t mind giving it a test run, that’d be great. And of course I would welcome contributions simple_smile

shanekilkelly16:02:17

I’ll chuck it up on clojars in a while, once I’ve got some tests in place

noisesmith16:02:32

I might not get a chance to use it soon, but we've got some mongo stuff that gets hung up on high write volumes and wants indexed lookups, and while we have a high performance mongo expert on our small team, I want to try out postgres json documents as a comparison to see if it handles writes a bit better

shanekilkelly16:02:45

This is turning out quite nice, now I can just do require [jdbc-pg-sanity.core] and it does the right thing.

noisesmith16:02:49

this is a long term plan but may not be something I touch this week

noisesmith16:02:02

shanekilkelly: awesome

shanekilkelly16:02:22

oooh, you may be interested in another side-project I’ve been working on for a while: http://bedquiltdb.github.io

noisesmith16:02:23

you don't even need the [] if you aren't using any :as or :import etc.

shanekilkelly16:02:58

mongo-alike json doc store implemented on postgres, with a nice clojure driver

noisesmith16:02:38

do you know where I would go to get info about performance under write load? I mean I can try migrating a part of our system and setting up a testbed etc. but it would be cool to just see a benchmark to tell me whether it's worth my time or not simple_smile

shanekilkelly16:02:03

i wouldn’t know, i’ve only done some small performance tests.

noisesmith16:02:09

I'm no dba, but on a small team what can we do heh

noisesmith16:02:26

but there are a lot of things I like about psql

shanekilkelly16:02:17

in my experience writing to an (id, jsonb) table is slower than the equivalent writes to mongo, but it’s not a rigorous test.

noisesmith16:02:52

before selecting mongo, the team did a bunch of small read tests where mongo was taking ns and psql was taking multiple seconds, but as I mentioned there was a mongo performance expert as part of the project and no psql perf expert, and no write load testing was done...

noisesmith16:02:23

shanekilkelly: hmm, under parallel write load or just linear vs. linear?

noisesmith16:02:39

yeah, parallel write load is the thing I need to test

noisesmith16:02:58

I guess I could probably make some monstrosity that would pummel different db engines now that I think about it

shanekilkelly16:02:02

ah, sorry, no idea simple_smile there is a decent book called High Performance PostgreSQL which may be useful

noisesmith16:02:50

eg. instead of testing my app, just directly test throughput under various conditions (N concurrent clients etc.)

shanekilkelly16:02:49

yeah, shouldn’t be too hard to write a parallel test harness that will give you an idea of write throughput.

mpenet17:02:03

What kind of write volume and data volume are you expecting ?

noisesmith18:02:22

mpenet: currently getting thousands per second at peak

noisesmith18:02:41

from multiple hosts

noisesmith18:02:14

@mpenet: one of my first jobs here will be getting a more precise number than that

noisesmith18:02:52

@mpenet: in each task there's a graph with ~1k nodes, and the edges are all written as separate documents, due to the way the graph is gathered, it's guaranteed to be relatively dense (eg. graph databases don't help us because they are optimized for sparse graphs)

noisesmith18:02:32

of course in psql I might want rows for edges and a table for the graph, (though it would be useful if we just had a global "graph of all things" and used queries to generate the specific graph given a set of nodes, this is unlikely to perform optimally)

noisesmith18:02:49

right now we manage with well tuned mongo, and the bottleneck isn't the db, it's some apis that gather data about the nodes (once we verify which ones we need based on the structure of edges)

hugesandwich19:02:53

@noisesmith: Obviously it depends a lot on what you are doing, but I wonder if you've thought about using any stream processors for reducing the bottleneck of gathering the nodes if it's something that can be done in parallel or perhaps using windows

hugesandwich19:02:16

@noisesmith: also there are some graph dbs that as far as I know do fine with dense graphs as long as you have some knowledge in advance of your use cases, example: Titan

hugesandwich19:02:40

specifically I mean things like vertex centric indicies

hugesandwich19:02:35

Also to be honest, I wouldn't really use mongodb if you're interacting with graphs. The json storage is convenient and reasonably fast, but I don't have much faith in mongo and the ecosystem for graph processing isn't as good as some other options with pretty good performance on writes, or for that matter, even reads

mikeb19:02:20

@shanekilkelly @donaldball I've been working on a lib targeting postgres, with common types pre-mapped, like timestamps, json, int arrays, string arrays, etc. Would love to hear any feedback. https://github.com/mikeball/foundation

shanekilkelly19:02:00

@mikeb: nice, will take a look :)

noisesmith19:02:52

@hugesandwich: the bottleneck is a paid api, and increasing the rate is a question of $$$

noisesmith19:02:30

@hugesandwich: I kind of suspected mongo was not ideal for this, which is why I am asking around about other options simple_smile

noisesmith19:02:42

one aspect of this is we have a true mongo wizard on the team, and part of the decision making criteria is what we specifically can accomplish (not the abstract perfect dev team for the job) so his existing skills are a factor

mpenet23:02:22

Yeah titan seems like a good candidate, possibly with a cassandra backend in your case

mpenet23:02:21

Depends on how much data too, but i guess you d be covered on the write front at least

mpenet23:02:37

Dunno what s the status of titan tho, seems a bit dormant since the aquisition

boorad23:02:47

small- to medium-sized graphs with mongo-ish documents are great in ArangoDB.