STATS) Basic Stats for AI
1. Statistics in AI

The goal of AI imitates a function in real world. If an AI model works well, it can predicts deterministic target. However, real world consists probability density. So, There is not always deterministic value. The true goal of AI is imitates probability density function in real world. Neural networks can model a virtual probability density function $P(\mathrm{y}| \mathrm{x})$, and approximate a probability density function of real world.
2. Probability distribution


| $P(\mathrm{x}=x)$, $P(x)$ | Probability that variable $\mathrm{x}$ is $x$ |
| $P(\mathrm{x})$ | Probability distribution (function) |
| $P(\mathrm{x}, \mathrm{y})$ | Joint probability distribution |

- Discrete Probability Distribution
- All case $x$ is discrete.
- $\Sigma_x{P(\mathrm{x}=x)}=1, \; where\; 0 \leq P(\mathrm{x} = x) \leq 1, \; \forall{x} \in \chi$
- Probability Mass Function(PMF) (fig 4. left plot)
- Continuous Probability Distribution
- $\int{p(x)}dx=1, \; where\; p(x) \geq 0, \forall{x} \in \mathbb{R}$
- Probability Density Function (PDF) (fig 4. right plot)
- Contrary to discrete probability, It is not possible to find probability about a given sample.
3. Conditional Probability
\(P(\mathrm{y}\|\mathrm{x})=\frac{P(\mathrm{x},\mathrm{y})}{P(\mathrm{x})}\) \(P(\mathrm{x},\mathrm{y})=P(\mathrm{y}\|\mathrm{x})P(\mathrm{x})\) \(P(\mathrm{x},\mathrm{y})=P(\mathrm{x}\|\mathrm{y})P(\mathrm{y})\)
Bayes Theorem
- When dataset $D$ is given, the probability of hypothesis $h$.
- induction \(P(h\|D)=\frac{P(h, D)}{P(D)}\) \(P(h\|D)P(D)=P(h, D)=P(D\|h)P(h)\) \(P(h\|D)=\frac{P(D, h)P(h)}{P(D)}\)
4. Marginal Distribution
- From joint probability distribution, get a probability function of a probability variable.
Use Bayes Theorem.
\[\;\;\;\; = \int{P(z\|x)P(x)}dz\]$P(x)$ is constant. The integral is about $z$, so the result of the integral is 1.
\[\begin{aligned} &= P(x)\int{P(z\|x)}dz \\ &= P(x) \end{aligned}\]5. Expectation
\[\mathbb{E}_{\mathrm{x}\sim P(\mathrm{x})}{f(x)}\]
$\mathrm{x}\sim P(\mathrm{x})$: $x$ sampled from $P(\mathrm{x})$
\[\mathbb{E}_{\mathrm{x}\sim P(\mathrm{x})}{f(x)} =\sum _{x \in \chi} P(x)*f(x)\]- In marginal distribution
Monte-Carlro Method
- Sampling from probability distribution, get weighted average of $f$