This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-08-24
Channels
- # architecture (4)
- # aws (1)
- # beginners (76)
- # boot (172)
- # cider (17)
- # cljs-dev (10)
- # cljs-experience (24)
- # cljsrn (45)
- # clojure (129)
- # clojure-berlin (1)
- # clojure-finland (1)
- # clojure-italy (8)
- # clojure-seattle-old (1)
- # clojure-sg (1)
- # clojure-spec (31)
- # clojure-uk (28)
- # clojurescript (88)
- # cursive (11)
- # data-science (1)
- # datomic (44)
- # fulcro (48)
- # hoplon (5)
- # jobs (3)
- # jobs-discuss (1)
- # leiningen (6)
- # luminus (42)
- # lumo (17)
- # off-topic (9)
- # om (29)
- # onyx (15)
- # pedestal (7)
- # protorepl (20)
- # re-frame (24)
- # reagent (46)
- # ring-swagger (2)
- # specter (2)
- # sql (3)
- # uncomplicate (58)
- # unrepl (29)
- # yada (5)
@blueberry I guess you don't want to announce your used MC method for bayadera publicly? It is neither mentioned in the slides or in the source code. I am implementing SGHMC atm. which does not need branching, but rather works like momentum SGD. I implement it with pytorch's autograd. I think something like it would fly on neanderthal + autograd.
It works for very high-dimensional problems like NNs over natural images: https://arxiv.org/abs/1705.09558
The paper has some. I will let you know once I can reproduce the results. What particular numbers are you interested in?
I am mentioning it, because I think an autograd functionality would be very helpful as an intermediary abstraction from neanderthal to build bayesian statistics toolboxes.
I don't have as much time as I would like to have, but I am exploring some clj-autodiff stuff atm. in https://github.com/log0ymxm/clj-auto-diff
I've skimmed through the paper, but this is the problem: most of those papers require familiarity with concrete problems (vision/classification in this case) to judge the results. There is no anchor I can use to judge this paper. I can not see the simplest data that I'm interested in when I hear about ANY MCMC implementation: how much steps it needs to converge for some problems that are easy to understand and compare, and how much time one step takes. For easy problems. If it works poorly for easy problems, I can not see how it can work great for harder problems. If it works ok for easy problems, then I can look at harder ones and see how it does there. There is so much hand-waving in general that usually does not surprise me that 99% of those is vaporware.
I was quite surprised when I saw the Anglican MCMC hello world at EuroClojure. The most basic thing you can use MCMC for took 239 SECONDS for the world's simplest beta-binomial.
Generating such artificial images is very very difficult, even if they are far from perfect. So you definitely need a good method to get there. The original SGHMC paper has more explanations, but there is whole string of literature related to this.
I mention it to you, because I can imagine that you don't have the time to dive into it.
I understand your concerns. I on the other hand am interested on using Clojure again at some point for my optimization stuff. Esp. since Anglican represents fairly sophisticated tooling compared to Python's libraries on top of tensorflow or theano.
That might be very useful for that particular problem (or not - i don't know since I don't do computer vision) but it does not tell me whether SGHMC is worth exploring for general MCMC that I'm interested in.
That is a very high-dimensional problem. In MCMC there are in general only convergence proofs for toy problems, so I cannot tell you how well it explores the distribution.
Instead of stochastic gradient descent the routine does not just find an optimum but samples from the posterior of the weight matrices given the data.
I'm not aware on ANY method that guarantees MCMC convergence, toy or non-toy! That is the most tricky part with MCMC.
How do their method compares to the state of the art? You know, those models Google/Facebook/Deep Mind or whoever else is the leader publishes?
They subsample from the chain, but I don't know much about these specifics yet. The sample probably takes a few megabytes.
In this paper they managed to compete with deep learning GANs and exploit Bayesian features like multiple samples from the posterior.
Sure. I thought you might be interested in scalable high-dimensional sampling methods. So far I just wanted to talk a bit about with you.
Do they compete regarding results only or also on speed. Because getting same results for 1000x times is a bit underwhelming (if that is the case, of course).
Estimating a full distribution vs. a MAP or MLE estimate is a lot more expensive in general.
I understand that it can be quite challenging in machine vision because the DL people really pushed the state of the art in the last decade.
I agree. I think a statistics approach can help you to do informed optimization decisions though, even if you go towards an MLE estimate in the end.
But if you can incorporate the strengths of deep nets, then you can also improve statistical methods. This is what a lot of people try do atm.
Use neural networks to approximate internal functions to make their samplers faster or get a better variational approximation.
I agree with your emphasis on performance. I explore it atm. because I am doing research. In a practical project I probably would stick to a CNN for these problems.
Yes, I have this internal struggle with the Bayesian approach. But so far it helps to stretch in this direction for me.
Because that is really strong in Python, esp. with Pytorch. I am really happy, despite Python being slow and a big mess under the hood.
cortex directly builds on layers and I don't understand why, except for business reasons. autograd has a very strong history in Lisp and it is an irony that Python is so much better at it than Clojure.
I'll add something more pragmatic (and effective IMO): vectorized gradients for all standard math functions on vector/matrix/tensor structures in neanderthal, but no general clojure code gradients.
That is probably sufficient. I agree that general autograd might be too much, but there is very rich literature and impls. in scheme and they have tried hard to make it efficient.
For reference https://alexey.radul.name/ideas/2013/introduction-to-automatic-differentiation/