This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-02-27
Channels
- # beginners (106)
- # boot (124)
- # cider (11)
- # clojure (105)
- # clojure-poland (2)
- # clojure-russia (28)
- # clojurescript (89)
- # core-async (14)
- # cursive (10)
- # datomic (7)
- # emacs (12)
- # garden (5)
- # hoplon (345)
- # immutant (127)
- # mount (2)
- # off-topic (24)
- # om (24)
- # onyx (8)
- # parinfer (51)
- # proton (2)
- # slack-help (4)
- # spacemacs (1)
Is Monger
the best way of accessing MongoDB
s, or is there something more highly recommended? (I ask because it looks like the build has been failing a lot, and I can't tell, but maybe that's just for Java 7?)
I've used congomongo
successfully in the past. No idea of the relative strength of Monger
.
@arrdem: oh awesome, that hadn't even stood out to me, thanks! Is there anything you particularly liked about congomongo
?
@hugesandwich: you mean your client code has to do all that work? a better pattern is to create a data-oriented api and hide the ugliness from clients so that clients just have to provide a data structure, and just get back a data structure. pedestal https://github.com/pedestal/pedestal does this well, in hiding the ugliness and types in the servlet api from client code like interceptors. the amazon aws api wrapper https://github.com/mcohen01/amazonica is also a very interesting example- it actually produces a data-oriented api based on metadata.
@jonahbenton: When I say client, I mean my code itself is a client, not the user as a client. The user only sees clojure types to/from. I'm doing the work of hiding that, just trying to make it easier.
gotcha. have you looked at amazonica? pretty interesting how it solves that problem
where specifically? I just looked, but there's a bunch of namespaces to dig through
I see a few coerce, coerce-value calls
anyway, if you already know something specific, let me know. I've got to head out for a bit, but thanks for the examples. Always the best thing if an approach is already proven.
the short answer is that the amazon api is pretty systematic, so it uses reflection to produce a data oriented clojure wrapper
this: https://github.com/mcohen01/amazonica/blob/master/src/amazonica/aws/ec2.clj gets you all of this: https://github.com/mcohen01/amazonica#ec2
Looks interesting. thanks! I thought some about doing a reflective approach up-front to avoid any per-call costs. In my case, I am trying to give people enough clojure sugar and data structures, but speed is a pretty big priority as this code will be processing huge amounts of data per second. I'm assuming this happens up front from what you describe, so no per-call costs.
well, some of it- it interns the generated symbols into the namespace, but in those functions there is dispatch logic executed at call time to determine the right native method. that in addition to data/type encode/decode logic. in the amazonica case, of course, all calls are going over the network so optimizing the in-jvm work isn't important
ok, in my case I'm working with streaming and stream related processing
so things get called a ton and need to be relatively fast, so I might look to another approach, but use the amazonica approach in another piece of code for a different piece of my project
sure, in that case even having to do data->type conversions on every call can be really detrimental. all the extra allocations, etc.
for sure, it's just a compromise as I mentioned and still seems to be pretty fast
I use transients and transducers a bit where possible for some gains, but trying to figure out some other ways to make it clean while still squeezing some extra juice
so the stream processing is happening in clojure, consuming java/scala types, or the reverse feeding clojure data into java/scala processing?
stream processing is happening in clojure, but could be done in java/scala and handed to clojure on the edges
there's also some reverse where i get things back over the wire and going back from java/scala to clojure
really it's just the last steps happen in clojure for stream processing, and receiving data, it's going out again to whatever, probably clojure
I'm not really interested in java/scala users using my code
but no reason you can't roll everything to a point and hand off to my code nonetheless
it's all distributed systems anyway
@josh.freckleton: No. congomongo
is tolerable, it falls into the one connection/`with-connection` anti-pattern which isn't awful. Its docs were decent, it looks like the monger
docs are stale and the maintainers haven't taken the time to fix them from a quick look at the issues list.
@josh.freckleton: obligatory "rethninkdb > mongodb for pretty much everything" comment and plug for the excellent clj-rethinkdb driver by @danielcompton
does anyone else find the clojure.jdbc stuff weirdly painful to work with? I’ve been getting really frustrated with the amount of boilerplate java-munging code needed to get it to behave sanely, and starting to think I may just be holding it wrong.
Yeah, I sometimes have issues with it as well. Mostly regarding the extremely flexible args and auto-transaction behavior, but it’s also oddly difficult to execute an insert statement that lets you get at the generated keys if you can’t use the built-in insert helper for whatever reason.
Once you learn its quirks though, it gets the job done well enough. Certainly I’ve never taken the time to sit down and puzzle through a simpler design.
the thing I’m having most trouble with is all the java-munging boilerplate code that needs to be copy-pasted into the project in order to have sane handling of (for example) postgres. Slightly ranty example: https://twitter.com/shanekilkelly/status/703299796476563457
should this stuff not be in a library that can Do The Right Thing™ by default?
(If I did, I would enforce every series of statements to specify its transactional requirements. I have come to mislike anything that obscures transaction handling.)
and that’s where I sit back and think, “hold on, maybe I’ve got this wrong”. so if anyone here is aware of a better way of doing this please speak up
Eh, that stuff is postgres rich datatype specialness, right? I could go either way on having that in c.j.jdbc
But it should/could be wrapped up in some kind of lib
yeah, i get that it’s specific to postgres, but in almost any other language I can name you don’t need to jump through any special hoops to get this stuff to work.
I’ve started looking at putting it in a library today actually. the one thing I’m not sure about: if my lib just doesn’t specify a version of jdbc or of the postgres adapter, and a dev imports my lib into their project, will my lib pick up on the version of jdbc/postgres they’ve specified in the main project? I kinda don’t want to proscribe which version of jdbc/postgres should be used
I mean, if we went up to the Python guys and said “you need about 100 lines of boilerplate code in order to talk to postgres” they’d laugh us out of the hall.
ok, i think I’ve got something that will work: https://github.com/ShaneKilkelly/clj-jdbc-pg-sanity
shanekilkelly: lein or boot or maven even will let a dev override your versions and use the version they ask for instead
i think that’s what I want this lib would be useless if they weren’t already using clojure.jdbc/postgres, so i think it’ll work
shanekilkelly: I think the reasonable thing is to make your stuff work with the latest stable version and let people try switching versions if they want
yeah, was thinking the same. will use the latest version in the test
profile, but not declare a dependency otherwise
nice repo image, btw
ha, thanks
shanekilkelly: no I think it's best to actually declare dependencies on things you use
if a dev specifies another version, they can use it, but the only reason to leave yours out entirely is enterprisey container bullshit
oh, ok. so should I just put the latest version in my dependencies vector?
which shouldn't be affecting sql stuff
cool, got it.
i was worried that a dev may be on an older version or something, and not wanting to force an upgrade
like, servlet related classes, yeah don't provide thos - the container will do it, but that doesn't apply here
so, if I’ve got this right, lein/maven will prefer the ‘application’ level version over the ‘library’ level version?
shanekilkelly: yeah, lein and boot are smart about letting a dev specify an explicit version, that shouldn't be your problem
awesome,
I’ve updated project.clj
thanks for taking care of some of that boilerplate, I might end up wanting to use that lib soon
if you wouldn’t mind giving it a test run, that’d be great. And of course I would welcome contributions
I’ll chuck it up on clojars in a while, once I’ve got some tests in place
I might not get a chance to use it soon, but we've got some mongo stuff that gets hung up on high write volumes and wants indexed lookups, and while we have a high performance mongo expert on our small team, I want to try out postgres json documents as a comparison to see if it handles writes a bit better
This is turning out quite nice, now I can just do require [jdbc-pg-sanity.core]
and it does the right thing.
this is a long term plan but may not be something I touch this week
shanekilkelly: awesome
oooh, you may be interested in another side-project I’ve been working on for a while: http://bedquiltdb.github.io
you don't even need the [] if you aren't using any :as or :import etc.
mongo-alike json doc store implemented on postgres, with a nice clojure driver
do you know where I would go to get info about performance under write load? I mean I can try migrating a part of our system and setting up a testbed etc. but it would be cool to just see a benchmark to tell me whether it's worth my time or not
i wouldn’t know, i’ve only done some small performance tests.
I'm no dba, but on a small team what can we do heh
but there are a lot of things I like about psql
in my experience writing to an (id, jsonb) table is slower than the equivalent writes to mongo, but it’s not a rigorous test.
before selecting mongo, the team did a bunch of small read tests where mongo was taking ns and psql was taking multiple seconds, but as I mentioned there was a mongo performance expert as part of the project and no psql perf expert, and no write load testing was done...
shanekilkelly: hmm, under parallel write load or just linear vs. linear?
linear
yeah, parallel write load is the thing I need to test
I guess I could probably make some monstrosity that would pummel different db engines now that I think about it
ah, sorry, no idea there is a decent book called High Performance PostgreSQL which may be useful
eg. instead of testing my app, just directly test throughput under various conditions (N concurrent clients etc.)
yeah, shouldn’t be too hard to write a parallel test harness that will give you an idea of write throughput.
mpenet: currently getting thousands per second at peak
from multiple hosts
@mpenet: one of my first jobs here will be getting a more precise number than that
@mpenet: in each task there's a graph with ~1k nodes, and the edges are all written as separate documents, due to the way the graph is gathered, it's guaranteed to be relatively dense (eg. graph databases don't help us because they are optimized for sparse graphs)
of course in psql I might want rows for edges and a table for the graph, (though it would be useful if we just had a global "graph of all things" and used queries to generate the specific graph given a set of nodes, this is unlikely to perform optimally)
right now we manage with well tuned mongo, and the bottleneck isn't the db, it's some apis that gather data about the nodes (once we verify which ones we need based on the structure of edges)
@noisesmith: Obviously it depends a lot on what you are doing, but I wonder if you've thought about using any stream processors for reducing the bottleneck of gathering the nodes if it's something that can be done in parallel or perhaps using windows
@noisesmith: also there are some graph dbs that as far as I know do fine with dense graphs as long as you have some knowledge in advance of your use cases, example: Titan
specifically I mean things like vertex centric indicies
Also to be honest, I wouldn't really use mongodb if you're interacting with graphs. The json storage is convenient and reasonably fast, but I don't have much faith in mongo and the ecosystem for graph processing isn't as good as some other options with pretty good performance on writes, or for that matter, even reads
@shanekilkelly @donaldball I've been working on a lib targeting postgres, with common types pre-mapped, like timestamps, json, int arrays, string arrays, etc. Would love to hear any feedback. https://github.com/mikeball/foundation
@mikeb: nice, will take a look :)
@hugesandwich: the bottleneck is a paid api, and increasing the rate is a question of $$$
@hugesandwich: I kind of suspected mongo was not ideal for this, which is why I am asking around about other options
one aspect of this is we have a true mongo wizard on the team, and part of the decision making criteria is what we specifically can accomplish (not the abstract perfect dev team for the job) so his existing skills are a factor
Yeah titan seems like a good candidate, possibly with a cassandra backend in your case