STATS) Basic Stats for AI

1. Statistics in AI

fig 1. Probability density function imitation

The goal of AI imitates a function in real world. If an AI model works well, it can predicts deterministic target. However, real world consists probability density. So, There is not always deterministic value. The true goal of AI is imitates probability density function in real world. Neural networks can model a virtual probability density function $P(\mathrm{y}| \mathrm{x})$, and approximate a probability density function of real world.

2. Probability distribution

fig 2. Probability distribution

fig 3. Joint probability distribution (ref. Wikipedia)

$P(\mathrm{x}=x)$, $P(x)$	Probability that variable $\mathrm{x}$ is $x$
$P(\mathrm{x})$	Probability distribution (function)
$P(\mathrm{x}, \mathrm{y})$	Joint probability distribution

fig 4. Two kinds of Probability distribution

Discrete Probability Distribution
- All case $x$ is discrete.
- $\Sigma_x{P(\mathrm{x}=x)}=1, \; where\; 0 \leq P(\mathrm{x} = x) \leq 1, \; \forall{x} \in \chi$
- Probability Mass Function(PMF) (fig 4. left plot)

Continuous Probability Distribution
- $\int{p(x)}dx=1, \; where\; p(x) \geq 0, \forall{x} \in \mathbb{R}$
- Probability Density Function (PDF) (fig 4. right plot)
- Contrary to discrete probability, It is not possible to find probability about a given sample.

3. Conditional Probability

$P(\mathrm{y}\|\mathrm{x})=\frac{P(\mathrm{x},\mathrm{y})}{P(\mathrm{x})}$ $P(\mathrm{x},\mathrm{y})=P(\mathrm{y}\|\mathrm{x})P(\mathrm{x})$ $P(\mathrm{x},\mathrm{y})=P(\mathrm{x}\|\mathrm{y})P(\mathrm{y})$

Bayes Theorem

When dataset $D$ is given, the probability of hypothesis $h$.

\[P(h\|D)=\frac{P(D, h)P(h)}{P(D)}\]

induction $P(h\|D)=\frac{P(h, D)}{P(D)}$ $P(h\|D)P(D)=P(h, D)=P(D\|h)P(h)$ $P(h\|D)=\frac{P(D, h)P(h)}{P(D)}$

4. Marginal Distribution

From joint probability distribution, get a probability function of a probability variable.

\[\begin{aligned} P(x) &= \int{P(x, z)}dz \\ &= \int{P(x\|z)P(z)}dz \end{aligned}\]

Use Bayes Theorem.

\[\;\;\;\; = \int{P(z\|x)P(x)}dz\]

$P(x)$ is constant. The integral is about $z$, so the result of the integral is 1.

\[\begin{aligned} &= P(x)\int{P(z\|x)}dz \\ &= P(x) \end{aligned}\]

5. Expectation

\[\mathbb{E}_{\mathrm{x}\sim P(\mathrm{x})}{f(x)}\]

$\mathrm{x}\sim P(\mathrm{x})$: $x$ sampled from $P(\mathrm{x})$

\[\mathbb{E}_{\mathrm{x}\sim P(\mathrm{x})}{f(x)} =\sum _{x \in \chi} P(x)*f(x)\]

In marginal distribution

\[\begin{aligned} P(x) &= \int{P(x,z)}dz \\ &= \int{P(x\|z)P(z)}dz \\ &= \mathbb{E}_{\mathrm{z}\sim P(\mathrm{z})}[P(x\| \mathrm{z})] \end{aligned}\]

Monte-Carlro Method

Sampling from probability distribution, get weighted average of $f$

\[\mathbb{E}_{\mathrm{x} \sim P(\mathrm{x})}[f(x)] \approx \frac{1}{n} \sum _{i=1}^{n} f(x_i), \; where \; x_i \sim P(\mathrm{x}))\]

Refers

Wikipedia, “Joint probability distribution”