Bayes theorem

- [[today's posterior is tomorrow's prior]], [[Bayes theorem examples]], [[derive Bayes rule]], [[marginal probability]] # Idea Bayes theorem allows us to move from "**what we know**" (likelihood and prior) to **what we infer** (posterior). - what we know: $p(fever|covid)$ - probability of having fever if I have COVID - what we want to infer or want to know: $p(covid|fever)$ - probability of having COVID if I have fever 90% of people who have covid have fever. 3% of people have covid. 5% of people have fever. If I have fever, what is the probability of me having covid? Applying Bayes theorem to get from **what we know** to **what we infer**: $p(covid|fever) = \frac{p(fever|covid) p(covid)}{p(fever)}$ We use Bayes's rule or theorem to combine information from the observed data—the [[likelihood]], $p(D|\theta)$—with information from the prior distribution, $p(\theta)$, to arrive at the posterior distribution, $p(\theta|D)$. $p(\theta|D)=\frac{p(D|\theta) \ p(\theta)}{p(D)}$ $ \text { posterior }=\frac{\text { likelihood } \times \text { prior }}{\text { marginal likelihood }} $ ## From prior to posterior ### Scenario 1 Alex and Brenda goes to the office regularly. We see someone in the office one day but we can't quite figure out whether it's Alex or Brenda. Without any information, our prior is that it's 50% likely to be Alex and 50% likely to be Brenda. Prior: $ P(\text { Alex })=0.5 \quad P(\text { Brenda })=0.5 $ But we have information (features of them) that can inform our guess. That person is wearing a red sweater. We know Alex wears red 2 times a week (5 days; or 2/5 = 0.4), and Brenda wears red 3 times a week (3/5 = 0.6). Posterior: $ P(\text { Alex })=0.4 \quad P(\text { Brenda })=0.6 $ ![[s20230304_164958.png]] ## Marginal likelihood $p(D)$ The denominator, $p(D)$ is the [[marginal likelihood]]—the probability of the observed data. It does not involve the parameter $\theta$, and is given by **a single number** that ensures the area under the [[posterior|posterior distribution]] equals 1. Thus, $p(D)$ is just a scaling parameter and the above equation is often written as the following: $ p(\theta \mid D) \propto p(D \mid \theta) p(\theta) $ That is, the [[posterior]] is **proportional to** (denoted by $\propto$) the likelihood times the prior. [[Lee 2014 - chapter 1 - basics of Bayesian analysis]] ![[Pasted image 20210303103407.png|800]] Note that with a [[uniform prior]], the mode of the [[posterior|posterior distribution]] coincides with the classical [[maximum likelihood estimate]]. [[Bayesian credible intervals]] capture the spread of the [[posterior|posterior distribution]]. We can [[obtain analytical solutions for the posterior distribution in conjugate cases]]. # References