Fork me on GitHub
#data-science
<
2019-11-21
>
val_waeselynck15:11:48

Does anyone here know of a tool for plotting 'confidence regions' of 2D probability distributions?

val_waeselynck15:11:53

More precisely, I'd like to draw (posterior) probability densities as 2D heat maps, with 'contour lines' delineating regions of probability mass 95%, 99% etc.

val_waeselynck15:11:54

Does that make sense, and does it have a name?

ben15:11:27

I think you can achieve something similar with ggplot2: https://ggplot2.tidyverse.org/reference/geom_contour.html

ben15:11:17

Might need to do something with `stat_contour` to get the specific regions you’re interested in. No idea about clj, I’m afraid

val_waeselynck16:11:57

Thanks. Thinking out loud, I guess I could also find the appropriate density levels, either by numerical integration + dichotomic search, or by filling a 2D array with densities, sorting the values and searching for quantiles. Then draw the contours at the appropriate level lines.

val_waeselynck16:11:45

I'm also wondering about the relevance of this approach for data analysis - are there alternative approaches to choosing / viewing 2D confidence regions that make this one uninteresting?

val_waeselynck18:11:18

Yes, but the point is not to just plot contour lines, rather precisely those contour lines which delineate regions of prescribed probability mass (90%, 95%, 99%, etc.). Finding which density levels correspond to those regions may not be trivial!

genmeblog19:11:41

This is not trivial. Contours are made out of kernel density estimator which is usually just gaussian blur (for 2d) or specific kernel function (for 1d). I don't see an easy way to estimate inverse CDF for such approach.

val_waeselynck17:11:13

@U1EP3BZ3Q In this case, I can evaluate the density at any point, so it seems doable: https://clojurians.slack.com/archives/C0BQDEJ8M/p1574352117165200

genmeblog17:11:04

Still integrating area is much more trickier than 1d range for symmetric distribution.

genmeblog10:11:36

@val_waeselynck > by filling a 2D array with densities, sorting the values and searching for quantiles

genmeblog10:11:58

to find quantiles you want to use icdf (cumulative density) not pdf (density). For 2d you want to find volume and area which covers say 95% of total density volume.

genmeblog10:11:05

For distributions like multivariate normal some numerical algorithms exist but I suppose they can't be applied to general case and any distribution (especially multidimentional empirical)

val_waeselynck16:11:01

> to find quantiles you want to use icdf (cumulative density) not pdf (density). Yes of course, just forgot to mention it :)

val_waeselynck16:11:29

> For distributions like multivariate normal some numerical algorithms exist but I suppose they can't be applied to general case and any distribution (especially multidimentional empirical) Yes for 2d gaussians this can be solved analytically - once you have an eigen-decomposition of the covariance matrix you're good, and even that may not be mandatory.

val_waeselynck18:11:18

Yes, but the point is not to just plot contour lines, rather precisely those contour lines which delineate regions of prescribed probability mass (90%, 95%, 99%, etc.). Finding which density levels correspond to those regions may not be trivial!