uncomplicate 2017-06-08 | Slack Archive

blueberry08:06:51

@whilo I'd need to see at least a hello-world-ish code of those ideas to be able to form an opinion.

whilo20:06:40

@blueberry http://www.robots.ox.ac.uk/~fwood/anglican/examples/viewer/?worksheet=gaussian-posteriors

whilo20:06:54

what high dimensional problems can you tackle?

whilo20:06:09

i would like to embed vision problems, but this seems fairly ambitious

blueberry20:06:10

I can see that the example takes 20.000 samples of one-dimensional distribution. That's a fairly trivial problem. See Doing Bayesian Data Analysis book - it shows many practical hierarchical models and many examples go into 50+ dimensions. All the examples compute in a fraction of second with bayadera (the same examples run for minutes with Stan, which does all fancy variational + hamiltonian mcmc etc. in C++ called from R). Anyway, for vision problems, which is perception, I do not see how anything can beat deep neural nets...

whilo21:06:01

Right. GANs should be embeddable in generative models.

whilo21:06:45

You asked for hello world, there are a lot more worksheets.

whilo21:06:49

🙂

whilo21:06:53

But you are right, that bayadera is much more focused on performance. For vision this might be very helpful.

whilo21:06:47

I just want to point out that the work around anglican is fairly broad and innovative in regard to language embedding and composability.

whilo21:06:47

I can totally imagine having a highly optimized bayadera model being part of it. As far as I understand the models are composable at the boundaries in general.

blueberry21:06:21

the problem is that those computations are so demanding that performance is THE metric. It doesn't matter if anglican can create more complex models if those models run in days. BTW, can you please do a quick microbenchmark and tell me how long does it take anglican to take those 20.000 samples (do not forget to unroll the sequence with doall)

blueberry21:06:44

another thing I do not like with anglican's approach is that it does not support regular clojure functions, but does some sort of model compilation that require the model code to be somehow special (i.e. not clojure)

blueberry21:06:49

when I talked about hello world I ment the demonstration of those fancy ideas (variational inference, NN, etc.) and the comparison with some baseline wrt. some metrics

whilo21:06:02

It does allow passing in of normal clojure functions through with-primitive-procedures

whilo21:06:33

Right, I get your performance point.

whilo21:06:58

Btw. what do you think of edward?

blueberry21:06:00

I'm not too much into probabilistic programming, I am more interested in probabilistic data analysis, and probabilistic machine learning. these things are based on the same theory, but are not the same.

blueberry21:06:27

moreover, I look at practicality.

blueberry21:06:41

not so much after pure research of interesting things any more

whilo21:06:51

The problem is not so much whether to do it in anglican or in bayadera, but whether to build on Clojure at all.

whilo21:06:54

I see.

whilo21:06:05

I agree about the programming aspect, although I think it is possible to have fast probabilistic machine learning in such a language and optimize inference with it through "compilation".

blueberry21:06:20

I mean, I am after interesting things, but I set a higher bar 🙂

whilo21:06:26

Practically speaking machine learning has not entered the programmers toolbox yet.

blueberry21:06:28

it also have to solve real problemt

blueberry21:06:32

problems.

blueberry21:06:45

most of those research project demonstrate toy problems

whilo21:06:48

From that direction anglican might be much more approachable to embed in some small problems, than going full machine learning.

whilo21:06:59

I agree.

whilo21:06:17

But I am in a group who does heavy vision problems and there it is the opposite.

whilo21:06:30

Probabilistic programming doesn't cut it for these problems.

blueberry21:06:34

yep, and I want to create such toolbox

whilo21:06:53

Or more precisely a bayesian approach.

blueberry21:06:54

of course, because vision is about perception

blueberry21:06:01

and not about logic

whilo21:06:14

I meant the cost to compute uncertainties.

blueberry21:06:46

probabilistic techniques might be interesting next layer, that could do some reasoning on the output of the vision layer

whilo21:06:03

Yes, that is what most people do nowadays.

whilo21:06:13

They use some CNN and just use it as a feature extractor.

blueberry21:06:39

yep.

whilo21:06:51

(and it works very very well) 🙂

whilo21:06:52

btw. next on my reading list: https://arxiv.org/pdf/1703.04977.pdf

whilo21:06:08

For the GMM the 20000 samples took 100secs on my laptop (unsurprisingly).

whilo21:06:40

MC methods are inefficient in general and not respected much in machine learing (at least from my environment)

whilo21:06:27

This is their newest take: https://arxiv.org/abs/1705.10306

blueberry21:06:29

And, that is the Gaussian distribution, which is the easiest distribution to sample from (after the Uniform)

whilo21:06:47

I know. But I don't think bayadera couldn't be described by the anglican syntax and framework. I think the big success of NNs besides initial breakthroughs is mostly due to the fact that modern toolkits allow easy composition and modelling.

blueberry21:06:48

Now, Bayadera can take much, much, much, more (I forgot how many) samples from Gaussian in a few dozens milliseconds, and most of this time is the communication with the GPU.

blueberry21:06:20

But consider this: 100 secs for the simplest hello world that you could find

blueberry21:06:25

how useful is that?

blueberry21:06:34

no matter what features are there?

whilo21:06:54

It was the GMM on iris, it was not the worksheet i have sent you. But it is still sample.

whilo21:06:30

simple

blueberry21:06:40

Bayadera gives you ordinary clojure functions. Why wouldn't you be able to compose them?

whilo21:06:04

I see. The problem is that I can barely convince anybody to use Clojure. For Bayadera to be attractive it would help if it would be part of a bigger community. Clojure in machine learning is still a very hard sell. I can probably use something on my own, but it will be difficult to attract colleagues. Anglican is not much better in that regard, but the few really nice projects that are out there feel very isolated and fragmented. My colleague would like to go with edward, I guess, since it is sponsored by OpenAI and built on top of tensorflow (although he doesn't like tensorflow in particular).

whilo21:06:00

What I like about Anglican is that I can see people using it for small data problems and inference. If this is possible with Bayadera as well, I am totally fine with it.

whilo21:06:49

With people I mean everyday Clojure developers without a GPU and a background in data science.

whilo21:06:04

This would allow to grow a community.

blueberry21:06:51

That's why I don't like to bump people to use my (or other) tools. I'm OK with competition using inferior tools 🙂 OTOH, the best way to convince people to use some thechnology is to build useful stuff with it. When they see the result of your work, they'll ask you to tell them what you did (provided that you did something awesome, of course).

blueberry21:06:56

Now, it is difficult to convince people to use Clojure for ML, when there are lots of pies-in-the-sky talk, but Clojure tools like Incanter are a joke.

blueberry21:06:52

I think that the GPU is essential for ML

blueberry21:06:12

For most methods, at least

blueberry21:06:40

And the theory has to be learned to some degree

whilo21:06:09

Hmm, yes. You are right about proving with results and the GPU. Anglican can leverage the GPU by embedding LSTMs or VAEs for its proposal distribution, just to point out.

blueberry21:06:37

Whoever hopes that they will be able to do ML with the level of knowledge required for Web apps and no maths, will spent years circling around punching other people's tutorials

blueberry21:06:06

I'd like to see some benchmarks

blueberry21:06:23

BTW Bayadera IS useful for small data problems, and I doubt it is useful for big data problems. That goes to Bayesian methods in general.

blueberry21:06:39

But, small data usually means big computation

blueberry21:06:48

and Bayadera is all about that 🙂

whilo21:06:54

https://arxiv.org/pdf/1705.10306.pdf Section 5.4 has a 4096 dimensional sampling space.

whilo21:06:19

But no statement over training time or inference.

whilo21:06:24

Just model quality.

blueberry21:06:00

that's the problem with most papers. they count number of steps, without regard how much one step costs wrt computation, memory access, and parallelization

whilo21:06:25

I agree that you need maths. But doing maths before seeing what you can do with machine learning can turn many people off. Esp. when you do probabilistic modelling, you need to do a lot of math, much more than for NNs.

whilo21:06:40

Yes, right.

whilo21:06:16

But I think combining MC samplers with NNs that way might be a very good idea.

whilo21:06:47

Inference will become a lot cheaper once the NN is trained (they coin it "compiled").

blueberry21:06:08

I don't think so - NNs also require maths to understand what you do, it's only that there are popular frameworks with ready made recipes, that work well for handful of problems (mostly vision and nlp). however, what if you do not work with vision and nlp?

whilo21:06:16

What do you plan to show off with Bayadera? 😛

whilo21:06:48

Yes, I like Bayesian approaches and generative models.

whilo21:06:55

I think they generalize a lot better.

whilo21:06:08

NNs just need you to understand gradients.

whilo21:06:20

No statistics.

blueberry21:06:42

Well, I already have (for more than a year) large speedup over Stan, but I am not even eager to publish that. I plan to work on some commercial applications, so I do not even want to show off the technology itself, but the product.

whilo21:06:59

And you are right, that is also my problem. As long as the toolboxes for black-box bayesian inference are complicated, they will never be as popular.

whilo21:06:23

I understand. Might be a good plan.

blueberry21:06:15

The biggest "doh" is that bayesian stuff isn't even that complicated, especially when you use the tools that someone other made.

whilo21:06:25

Why have you presented at bobkonf then?

blueberry21:06:29

Stan is used by social scientists and biologists

whilo21:06:18

Yes, that is why I like the blackbox approaches where you can specify a generative process and then have reasonable inference per default.

whilo21:06:49

I think machine learning should stop being a niche thing for highly skilled specialists.

blueberry21:06:28

The problem with programmers is that sometimes they... well, almost like physicists. But, I do not even care. I open-sourced the technology because at least some people can "get it" (and some really do get it and use it) and I might get valuable feedback and even some contributions. However, I do not want to beg anyone to use it and why would I?

whilo21:06:29

I have been motivated by LDA 4 years ago, it was a very intuitive generative model and easy to understand even without heavy math knowledge back then.

blueberry21:06:56

Did it solve practical problems?

whilo21:06:08

You are right. I am thankful that you take the time to argue with me.

whilo21:06:27

LDA?

whilo21:06:30

Definitely

whilo21:06:06

You probably know it, it is known as topic modelling to the industry.

blueberry21:06:35

I don't use that. It is used in NLP?

whilo21:06:41

Yes

blueberry21:06:51

I don't do NLP, that's the thing 🙂

whilo21:06:13

You can get the topics generating a corpus of texts and the distribution of topics per document in an unsupervised fashion from it.

whilo21:06:18

It is really handy.

whilo21:06:34

(even if just two browse your local paper collection 🙂 )

blueberry21:06:55

Did you have any application/business ideas around that or more like research curiosity?

whilo21:06:07

It was at university.

whilo21:06:39

I definitely has business value and is heavily marketed already.

whilo21:06:48

The original paper is from Blei et al. 2003

whilo21:06:09

Something like topic modelling for vision would be nice.

whilo21:06:38

http://www.jmlr.org/papers/volume3/blei03a/blei03a.pdf

blueberry21:06:41

Are you still looking for the thesis topic, or you are set with something?

whilo21:06:18

I am trying to be set. It is frustrating. I would like to at least work probablistically and not just throw NNs at some thing.

whilo21:06:28

So, no.

whilo21:06:34

I was with my supervisor today.

whilo21:06:50

They are very focused on vision.

whilo21:06:57

I don't want to just do some topic they throw at me. I like to be motivated by myself, but this is not working well this time.

blueberry21:06:26

Did you see the (old-ish) book by Bishop called Pattern recognition and machine learning?

blueberry21:06:37

It's from the pre-DL era (2006)

whilo21:06:39

Yes, I worked through parts of it.

whilo21:06:56

Mostly first 100 pages, EM and variational inference.

blueberry21:06:08

but the book is probabilistically-oriented, and he discusses the probabilistic perspective of NNs

blueberry21:06:18

and similarity to bayesian nets

blueberry21:06:59

but the problem is that DL is so successfull with vision and perception in general that it is a slim chance that you'll get something with bayesian methods

whilo21:06:08

Yes.

whilo21:06:16

It is also not clear what the uncertainty buys you.

blueberry21:06:16

they are simply the hammer for another kind of nails

whilo21:06:22

It definitely costs computation.

blueberry21:06:40

it does not buy you anything if you have enough data

blueberry21:06:51

but in vision you usually have lots of data 🙂

whilo21:06:02

Well, the paper I have first posted is from a guy (Yarin Gal), who showed that droput regularization turns the neural net into a bayesian neural net (or a GP).

blueberry21:06:18

maybe bayesian techniques could be useful when you have extremely scarce visual information

blueberry21:06:24

but is there such domain?

whilo21:06:26

So you can use the uncertainty on the output by running the inference several times in a meaningful way.

blueberry21:06:54

what is your prior there

blueberry21:06:56

whilo22:06:15

An example from Anglican was to have generative model for captchas and theirs was state of the art, cracking all of them.

whilo22:06:16

Your prior is a GP prior. That corresponds to a normal distribution over the weight matrices.

whilo22:06:31

http://mlg.eng.cam.ac.uk/yarin/blog_3d801aa532c1ce.html

blueberry22:06:54

wait, wait. I meant, you, as a human, set some prior that describe your current (pre-data) knowledge. How do you decide on that?

blueberry22:06:24

That article looks interesting

blueberry22:06:33

I put it in my bookmarks

whilo22:06:47

It depends on whether you parametrize the normal distribution again.

blueberry22:06:59

Hope I'll have time to go through it more attentively in several months 🙂

whilo22:06:05

Hehe

blueberry22:06:31

Ah, but why does it have to be the Normal? 🙂

blueberry22:06:54

There are many kinds of random processes, and many kinds of distributions 🙂

blueberry22:06:15

I understand that in vision, Gaussian might be the thing.

blueberry22:06:54

But generally, let's say I am trying to estimate when people call the call center, or something like that

blueberry22:06:27

Or the risk of giving out loans

whilo22:06:41

You pick a kernel function to calculate the covariance matrix of the gaussian K

whilo22:06:50

and then the prior is GP(·|0, K)

blueberry22:06:55

Or some general risk -> Bayesian methods are really good match for measuring risk

whilo22:06:13

Hmm, no that makes no sense

blueberry22:06:23

whilo22:06:43

The prior knowledge flows into the kernel function.

whilo22:06:49

for the covariance matrix

whilo22:06:30

from that you generate a function that has some prior set on the uncertainty of the measurement.

whilo22:06:58

All distributions are Gaussian, hence the whole thing is Gaussian again.

whilo22:06:28

But the kernel function can do arbitrariy complicated things, it can be a NN.

blueberry22:06:34

Note that the model is pretty sure about the parameters, and they are not very probabilistic 🙂

blueberry22:06:36

anyway

blueberry22:06:51

I get that, but, given an unknown problem,

whilo22:06:06

I know. I don't like kernel methods very much.

blueberry22:06:16

how do you decide it is a good fit to be described by Gaussian likelihood,

blueberry22:06:36

and how do you transfer your prior knowledge to the parameters?

blueberry22:06:54

Yep, many "bayesian" methods are not that much bayesian

blueberry22:06:03

which doesn't mean they are not useful

whilo22:06:30

There are deep GPs now btw. Where the joint probability is not Gaussian any more.

whilo22:06:38

That is what the blog post talks about as well.

blueberry22:06:57

Anyway, this doesn't help you in choosing the topic 🙂

blueberry22:06:17

What outcome is expected of you?

blueberry22:06:37

A number of published papers, or something else?

blueberry22:06:22

Does it have to be a part of a narrow EU/DE/industry-backed project or you are more free in choosing the area as long as it is vision?

whilo22:06:49

Solving a good vision problem, something practical. Not just doing maths or playing around.

whilo22:06:01

Although I want to improve my math skill still.

whilo22:06:18

I think targeting a good paper as a result would be reasonable.

whilo22:06:23

But not required.

blueberry22:06:30

Solving to be the best in the world or just good enough?

blueberry22:06:55

Really? The paper is not required? You have it easy 🙂

whilo22:06:04

I think good enough would be ok.

whilo22:06:10

I am not sure. I want the paper anyway.

blueberry22:06:16

Good enough is ok? Even better 🙂

blueberry22:06:25

Why worry then?

whilo22:06:04

It would be good if we can show that bayesian modelling can work for vision problems, atm. weakly supervised problems are interesting.

whilo22:06:12

(to the group)

blueberry22:06:31

Do you have some problem where data is really, really poor and DL does not work well?

whilo22:06:56

I would like to do something good and know that I can work well with the methods.

blueberry22:06:16

That's a Herculean task...

whilo22:06:39

I am also thinking about doing a PhD.

blueberry22:06:52

DL (and most of ML) fields seems to me as many soothsayers throwing bones around and reading star constellations

whilo22:06:06

Hehe

blueberry22:06:10

You are not already on that path?

blueberry22:06:20

Are you in MSc?

whilo22:06:25

Yes.

whilo22:06:35

This is my master thesis.

blueberry22:06:58

Ah, in that case, why are you talking such a heavy task?

blueberry22:06:32

Isn't it more appropriate for that level to take something that has been researched well and make a good implementation?

whilo22:06:47

Good question. Maybe I have stayed too long in the math department and have complexes now.

blueberry22:06:03

That's why I asked you about the paper.

whilo22:06:04

I have done this in my bachelor thesis already.

blueberry22:06:19

What you are talking here is too much for master's thesis.

blueberry22:06:25

Especially wrt time.

whilo22:06:54

Right.

whilo22:06:25

Well, I mostly see paper's and thesis from state of the art people in the field and they are really good.

blueberry22:06:47

But these people are doing post-doc research full time.

blueberry22:06:07

And often have worked on those problems for years or even decades.

blueberry22:06:23

DL people were doing that thing since 89s

blueberry22:06:26

80s

blueberry22:06:33

when it was not cool

blueberry22:06:37

and not very useful 🙂

whilo22:06:43

Right.

blueberry22:06:07

So you can try to do the same for master's if you have 10 years for that 🙂

whilo22:06:25

I don't. I need to break the problem down and do something focused.

blueberry22:06:32

but it is better to do some cool exploratory implementation as an introduction to phd

blueberry22:06:46

or that is what I'd advise my students 🙂

blueberry22:06:39

although they are all working programmers, since in Serbia there is little opportunity to fund students to work on research so they are all working full-time in industry cranking code, and do their studies part-time

whilo22:06:42

That is a reasonable advice.

whilo22:06:05

That is tough.

blueberry22:06:13

yep

whilo22:06:24

I have worked part-time with Clojure over the last half year, but I was lucky to do so with friends.

whilo22:06:33

It payed not very well, but it was ok.

blueberry22:06:51

cool. what are you making?

whilo22:06:08

it was an app prototype for a client who wants something like yelp.

whilo22:06:18

although i think he doesn't know exactly what he wants.

whilo22:06:49

but we could do datomic with fulltext-search and google places integration in the backend and two single page apps and an android and ios app.

blueberry22:06:58

well, at least you polished your Clojure and were paid for that 🙂

whilo22:06:00

it is not finished yet, but clojure was reasonable.

whilo22:06:06

yes.

whilo22:06:19

we also stayed in our deadlines, which was good to see.

whilo22:06:50

i need to take the dog for a walk

blueberry22:06:01

enjoy 🙂

whilo22:06:12

really nice talking to you, hopefully we can continue this soon

whilo22:06:20

can bayadera run without a GPU atm.=?

whilo22:06:29

i just have this laptop right now

no.

no problem

i can get a gpu

atm it requires AMD GPU that supports OpenCL 2.0, but I'm also probably giving it CUDA backend soon-ish

blueberry22:06:36

and MAYBE a CPU backend

blueberry22:06:52

But that depends on inspiration 🙂

whilo22:06:00

no pressure. has nvidia reasonable opencl support?

blueberry22:06:08

no. 1.2

whilo22:06:14

pfff

whilo22:06:33

i follow linus' comments... 😉

blueberry22:06:50

although my card, R9 290X, which is quite a beast, probably costs something like 100 EUR in Germany now (second-hand)

blueberry22:06:17

Because it's couple of generations old, but was top of the line 3 years ago.

blueberry22:06:29

So it might be a modest investments

blueberry22:06:56

While my Nvidia 1080 cost around 1000 EUR here less than a year ago

blueberry22:06:07

and is only 30% faster

whilo22:06:38

Hmm, interesting.

whilo22:06:48

Yes, nvidia was always fairly expensive.

whilo22:06:03

At least when I last had a look at perf 5 years ago or so.

blueberry22:06:39

Nvidia's main strength is its suite of hand-tuned libraries, cuBLAS, cuDNN etc.

whilo22:06:52

Have you integrated NNs (e.g. pretrained) in your pipeline yet?

whilo22:06:03

Yes, I think so too.

blueberry22:06:05

no. I'm not that into nns

blueberry22:06:27

Everybody's doing that, and they are pretty optimized the field

whilo22:06:35

I see. There are tons of pretrained models out there though, which could be helpful for industry applications.

whilo22:06:39

E.g. pretrained CNNs.

blueberry22:06:44

I want to work in an uncluttered niche

whilo22:06:57

I understand that very well.

whilo22:06:50

Well, this should be mostly plumbing, nothing critical. pretrained models can be applied to data separately anyway.

whilo22:06:55

What do you use to store tensors?

whilo22:06:03

I used hdf5 in my bachelor thesis.

whilo22:06:09

Clojure support is a bit weak.

whilo22:06:21

I had to tweak the library for multiple dims.

blueberry22:06:23

Nothing now. I don't need them for bayesian stuff (yet).

blueberry22:06:45

When I need them, I'll add them to neanderthal...

blueberry22:06:17

With a Intel-based native backend, and cuDNN GPU backend probably

whilo22:06:46

blueberry22:06:47

as for storing, I leave that to the users for now. They can implement their own transfer methods to/from storage

whilo22:06:02

Yes, I wouldn't pack that into the libraries.

blueberry22:06:07

your dog's getting impatient 🙂

whilo22:06:15

It really sucks that all people in python use numpy pickeling.

whilo22:06:28

That is right 🙂

whilo22:06:30

Cu soon

blueberry22:06:35

bye

2017-06-08

Channels