This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-01-06
Channels
- # architecture (8)
- # aws (2)
- # beginners (156)
- # boot (163)
- # cider (22)
- # cljs-dev (2)
- # cljsrn (11)
- # clojars (6)
- # clojure (328)
- # clojure-austin (7)
- # clojure-dusseldorf (10)
- # clojure-italy (2)
- # clojure-russia (19)
- # clojure-spec (178)
- # clojure-uk (86)
- # clojurescript (81)
- # cursive (17)
- # datomic (33)
- # funcool (40)
- # hoplon (8)
- # jobs (5)
- # klipse (13)
- # leiningen (1)
- # luminus (21)
- # off-topic (140)
- # om (49)
- # om-next (4)
- # onyx (29)
- # planck (5)
- # protorepl (2)
- # re-frame (58)
- # reagent (2)
- # remote-jobs (4)
- # ring-swagger (16)
- # testing (1)
- # untangled (26)
- # yada (27)
@henrique.alves : auto differentiation is really nice, but TF really shines at GPU, multi GPU, builtin in optimizers (SGD, Adam, RMSProp), etc ...
TF is actually a lot of things (including a client-server model for distributed computing), but one of those "things" is just a way to declare a computational graph and compute gradients w/ https://en.wikipedia.org/wiki/Automatic_differentiation#Reverse_accumulation
after that, plugging into a quasi-newton optimizer is a separate thing
that said, TF API is really sad w/ it's variables and placeholders. manipulating the formula symbolically in lisp would allow some nice things (including composition, which falls apart on TF)
@henrique.alves : completely agree on the variable / placeholder / formula manipulation
@henrique.alves : unfortunately, I feel that keras, in doing away with the vars, goes too far
what I really want is to say: everything is a float32 or an int, infer it from type checking
there's no reason why TF code should be so much longer than the LaTeX of the math formulas
keras is an API for building models more than an API to define a function to be optimized
it's already at a higher-level
in my limited experience with keras, it feels like the main op I could do was "add layers"
yes. keras is the equivalent of scikit-learn's pipelining/essembling methods
difference is that instead of pipelining some estimator, you add a "layer" which is some activation function to be evaluated on TF
in the end a NN and a bunch of complex estimators piped together are not very different, besides the fact you can back-propagate on the first one (currently)
I've worked through quite a few TF examples; what can we realistically do about "clj DSL for describing functions for TF to optimize" ?
I've love to be able to describe the computational graph in CLJ DSL, then hit some key, and have it translated into Python/TF and optimized on my GPU
is anyone working on this bc it sounds awesome?
outputting python code would be waste since it's possibe to use https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/OperationBuilder.java directly
1. that looks very cool 2. what about things that are not TF though, for example, many TF examples uses numpy to generate data; and the MNIST example calls python-learning-libraries to get the data
it seems like a java->tf approach loses alot of the libraries that numpy + python/learning provides
TF doesn't use numpy arrays natively
I know what you mean... but I'm saying TF is not coupled to numpy, it has it's own type representation for matrices anyway https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/src/main/java/org/tensorflow/Tensor.java#L36
the Python API deals w/ casting from numpy -> tensor
this is not my code; just a result from google: https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/2_BasicModels/linear_regression.py
it seems that for practical ML, having numpy / other learninb libraries around is very useful
these numpy.asarray
literals are copied to another object before evaluation anyway, that's what I mean
or rather, "unpacked" from the PyObj
to the native C++ TF type
sure; but say you're doing a practical ML problem, under the "dsl->python tf" approach, it is: 1. write some data input code in python 2. write some clj dsl for describing computation graph / how to optimize 3. run compiler, which does "dsl -> python tf" and merges the two codes, and runs it on the GPU It's not clear how doing "clj -> tensorflow via java" simplifies any of this
1. in python, wriite code to input data 2. in clojure dsl, write code to describe computation graph / which optimizer to use 3. run some wrangler, whhich compiles the clojure dsl to python tf code, merges the code together, and runs it on a GPU
well, w/ the Java API you don't really need Python, not even to ingest data
true; but you would have to rewrite (1) any input routines and (2) any numpy routines you wanted to use
(PS, I would love to be proved wrong and use a pure java solution -- I don't love python -- I just don't see a way to get rid of it)
data ingestion is just vectorizing some arbitrary stream into an efficient dense/sparse matrix format... there's nothing special about this
for one example
true; but reading MNISt / Cifar10 data, in python, is just "import this learning library; use 2 lines of code" -- I'm not sure if Java learning libraries are that good yet
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/fully_connected_feed.py#L53-L55 <-- this is reading in MNIST data in python
there's nothing special about this besides the fact they package MINST dataset inside this example package
name any standard library dataset, it's probably packaged in python; name any standard library, there's probably a pylearn / numpy implementation
you don't need to lecture me - I work w/ financial models in Python (and R) all day
and no, Python doesn't have all, sometimes we have to fallback to R
I mean no offense; I'm trying to understand why you are recommending using Java instead of leveraging all the libraries Python supports.
my company already relies on Clojure/JVM heavily, so that would be nice, for one. second, Python packaging/deploy story is awful compared to shipping JARs. being able to use TF and side-stepping this would also be nice
there's not much to leverage from the Python ecosystem when you're using only TF though - usually your entire model will be defined in it instead of mixing-matching w/ sklearn, and numpy doesn't matter
the same is not true if you rely on sklearn which is really much better than the rest (Weka, etc)
Spark has http://spark.apache.org/docs/latest/ml-guide.html which can be interesting, although some models benchmark poorly against the equivalent on sklearn
Spark can be useful for massaging data, even if you run w/ one worker
and having many CPUs is orthogonal to GPU
It does sound like in your case, pure Java makes sense, if all your inputting is via Spark, rather than python, and you want to deploy on machines that already have JVM.
https://www.packtpub.com/big-data-and-business-intelligence/clojure-data-science
all in all, Python feels like a liability nowadays (unless you're Google), so being able to leverage TF from other platforms would be good
and on top of Clojure, even better
the quality of some (heavily used) Python libraries is dubious too. after getting a bug on how numpy computed the median of a vector I don't take the environment as a clear winner anymore
Though as a hobbyist, a "sorta working library" is often good enough, since real $$$ isn't relying on it.
as an anecdote, the median bug could easily cost some millions and shutdown the company, since imputing a missing variable w/ it's median is a common imputation method for financial models
that could skew the distribution and make "bad" clients look like "good" clients and go unnoticed, easily
these things are tricky
credit
if I was in the finance industry, I'd probably also be really pissed off at hobbyist code that works 90% of the time, and the other 10% of the time results in lossing mllions of dollars
I now see why you so heavily value TF andplace so little value on "existing library support"
there is some talk (I now forget from which conference) where the speaker asked the audience "How many of you have built at least one predictive model?", and the entire room raises hands
then he asks "How many of you have used any in production", and a couple raises hands
there's a big gap between having something w/ predictive power and having something w/ predictive power that is correct, tested, monitored and scales
it's a space that's ripe for some actual software eng. discipline - most progress so far has been based on marginal improvements on R^2 or some error metric at the expense of everything else. researchers don't have to make trade-offs to publish a paper
good night!
In a new project, how much time do you invest in planning and conception? Do you tinker with prototypes and convert them into a finished product, or do you use them only for planning? Which tools can you recommend for the conception process?
Can someone request this feed? It says I’m blocked, but I think it’s global: http://www.health.harvard.edu/blog/feed
@gklijs Are you my old Finalist colleague btw? Nice to see you are also interested in Clojure 🙂
yes, read something about it the summer, have a clojure/clojurescript application to play around with, but for work it’s hippo atm
@gklijs If you want, register for https://www.eventbrite.com/e/dutch-clojure-day-2017-tickets-30113550440
@borkdude any idea who is organising dutch clojure day. I submitted a talk but have no idea on its status 😕
@wallydrag I’m not involved, but you could try #clojure-nl
hello guys, where should I ask questions about software architecture ?
@baptiste-from-paris I believe this channel is ok, or you could try #architecture
let’s try here ^^
So I have to build a messaging system used for dispatching requests on different servers (using specific rules)
My question : does someone have readings, recommandations on good resources for choosing a specific queue service (RabbitMQ, HornetQ, 0MQ …)
One of the things you might want to take into account is if you want to have multiple consumers of the messages and/or if you want to know if a message has been consumed. Kafka is great for scalability, and decoupling consumers and producers. But because of this you have to do a lot of work if a producer wants to know if a message has been consumed yet.
ok, and something smaller maybe ?
it’s a simple archi
I only have experience with Kafka, and a bit with IBM mq, so can’t really give good recommendations one something smaller, but I heard good things about RabbitMQ, and you also mention it as the first one, so maybe you can do a poc with it, to see if it covers your needs?
ok ok, need to find a comparison somewhere
between all of the vendors
rabbit is good and more functional than any of the other ones I've come across if your throughput needs are relatively small. I'd avoid 0MQ just because it's very low level and not persistent. You use 0 for ephemeral things like RPC and things where you really want to be light. It's basically point to point. Kafka is more about huge volume it's relatively slow on a message by message basis but it can handle trillions per second if setup right. ActiveMQ is another option, again lots of options but i'd say rabbit is strictly superior. It's faster, easier to manage, more reliable and can handle more volume. I mostly tend to go for kafka + rabbit as between the two you can do most patterns well.
There are more vendor specific ones too but I discount those out of hand for binding me to a vendor.
http://queues.io/ has a good list of most and some comparisons
thx a lot @dexter
any good lib with clojure ?
i used http://clojurerabbitmq.info/ and have been quite happy with it
Kafka has a nice clojure lib too, but I'm not sure that's the right tech choice for you
thx a lot for your help
i’ll take a look at everything
have you heard of this one ?
baptiste-from-paris I haven’t used langohr, but I’ve used Monger a bit and it seemed to have really high quality
ok ok 🙂
@baptiste-from-paris that is the one I linked 😛
Odd ideas - generating data from an Avro schema? Would an integration with Avro be appropriate for clojure.spec? Humm. Would have to look around to see if the intended use for each lines up.
can anyone of the admins please enable the ALL UNREADS
preference?
https://get.slack.help/hc/en-us/articles/226410907-View-all-your-unread-messages
@ejelome The option is on your own preferences
What would you call it if you wanted to put a bug bounty on an issue but it's not really a bug it's just research/poc? feature bounty doesn't sound as cool. Research reward?
Hi, my name is Ari and I run a newsletter called Developer to CTO. It's a weekly newsletter that provides career advice to software developers. You can check it out at http://clktech.io.