Statistics for Data Science (I)

Data science has basically two pillars: computer science and statistics. While computer science provides a computing platform for data science, statistics provides all the theories and models for the analysis work in every data science project.

Discrete Random Variables

boolean algebra
set notation
probability
variables
discrete univariate
discrete multivariate
continuous univaraite
continuous multivariate
probability distribution
PMF (Probability Mass Function): a function that gives the probability that a discrete random variable is exactly equal to some value.
CDF (Cumulative distribution Function): defines the probability that X will take a value less than or equal to x.
properties of a distribution: mean, median, mode, variance, standard deviation, quantiles, entropy
entropy defines the “level of uncertainty” of the outcome. When there is no randomness, entropy will be zero. When randomness of outcome is high, entropy will be high.
linearity of expectations: when we say the expected value is linear in x, it means:
```
*E[aX] = aE[X]*
*E[x + Y] = E[X] + E[Y]*
```
conditional probability: *P(A B) = P(A and B)/P(B)*
conditional expectations
conditional distribution

Well-Known Discrete Distributions

Bernoulli distribution: A variable that is either 1 (with probability p) or 0 (with probability 1-p). The mean is also p.
Binomial distribution: The sum of n Bernoullis. Parameters: p and n$.
Categorical distribution: Generalized Bernoulli to more than 2 values.
Uniform distribution: A special case of categorical (sort of), in which p_i=1/K for all i=1,…,K.
Multinomial distribution: Generalized binomial to more than 2 values per trial. Parameters: n, p_i for i=1,…,K.
Geometric distribution: The number of attempts to achieve your first “success” in sequential trials. Parameter p is the probability of success
Poisson distribution: has only one parameter, the mean, typically called $\lambda$

Discrete Multivariate Distribution

Joint distribution:

P(X=x, Y=y, Z=z)
when we say random variable X and Y are independent to each other, it means:
```
P(X, Y) = P(X)P(Y)
P(X|Y) = P(X)
E[XY] = E[X]E[Y]
```

marginal distribution:
assume X and Y have a joint PMF P(x, y):

P(X = x_i) is a marginal probability
P(X = x) is a marginal distribution

properties of multivariate distribution

Covariance
Correlation

Bayes’ Theorem: P(A|B) = P(B|A) P(A) / P(B)

Law of total probability:

Continuous Random Variables

PDF (Probability Density Function): defines the distribution of a continuous random variable.
PDF is not probability, it is density
probability of a particular value is zero. probability can only be calculated for a value zone.
CDF (Cumulative distribution Function): defines the probability that X will take a value less than or equal to x.
Expected values
Expected values of functions
joint PDF:

Well-Known Continuous Distributions

Uniform distribution: p(x) = 1/(b-a) for a<x<b
Gaussian distribution (aka “Normal distribution”): Parameters: mean, standard deviation
Exponential distribution
Beta distribution

Relevant R Functions

The uniform distribution: dunif(), punif(), qunif(), runif()
The normal (Gaussian) distrubution: dnorm(), pnorm(), qnorm(), rnorm()
The exponential distribution: dexp(), pexp(), qexp(), rexp()

Metrices

Random vector
multivariate Gaussian covariance matrix
how many parameters?

Relevant Python Functions

random numbers: randn(), rand(), randint()
probability functions: binom.pmf()

Maximum Likelihood Estimation (MLE)

MLE is NOT a probability distribution

Written on December 19, 2018