data-science 2022-11-03 | Slack Archive

Joe13:11:14

I have a general question about linear regression models, hope that's allowed here. I'm struggling with the concept of simulations for linear regression models. I have a simple linear model for weight (W) on height (H), with 352 observations in the dataset.

Wi ~ N(μi, σ)
μi = α + β(Hi - Hbar)

(α, β are stochastic) I run the model, conditioned on the dataset, and generate traces for the posteriors. For my trace, for μ, I get n samples for every value of H in the dataset. Say the first 2 values for H in the set are 151.76 and 139.70.

i      →  | 0       | 1      | ...
Height →  | 151.76  | 139.70 | ...
-----------------------------------
PostSmples| 43.53   | 36.23  | ...
 for μ    | 42.84   | 34.88  | ...
    ↓     | ...     | ...    | ...

So now I can simulate weights for any value of H that is in the dataset by randomly picking a values of mu from the trace for that particular H (and sigma) and doing w=normal(μi, σ).sample() But how do I generalize that to turn it into a simulation model for arbitrary H? i.e. for values of H that are not in the dataset? Thanks

Rupert (All Street)23:11:14

Linear regression let's you find a straight line that best fits your data. ( y = m * x + c ) where m is slope c is y-intercept. So once you have m and c from linear regression, you can put any value of x into the formula to predict y .

Joe11:11:38

Oh, I get it - mu is deterministic, so to get W for arbitrary H I need

w ~ N(alpha+beta*H, sigma)

So I just need to sample from alpha and beta. Thanks!

👍 1

2022-11-03

Channels