x | 1 | 2 | 3 | 4 | 5 | 6 |
p(x) | 0.167 | 0.167 | 0.167 | 0.167 | 0.167 | 0.167 |
part I
NBIS, SciLifeLab
April 24, 2023
Probability describes how likely an event, \(E\), is to happen.
Probability describes how likely an event, \(E\), is to happen.
A probability is always between 0 and 1, where 1 means that the event always happens, and 0 that it never happens.
Probability describes how likely an event, \(E\), is to happen.
The total probability of all possible events is always 1.
The sample space, \(S\), is the set of all possible events.
Probability describes how likely an event, \(E\), is to happen.
The probability of two disjoint (non overlapping) events, is the sum of the probability of each event separately.
Probability describes how likely an event, \(E\), is to happen.
Based on the axioms the following rules of probability can be proved.
By drawing balls from the urn with (or without) replacement probabilities and other properties of the model can be inferred.
A random variable describes the outcome of a random experiment.
A random variable describes the outcome of a random experiment.
Random variables: \(X, Y, Z, \dots\), in general denoted by a capital letter.
Probability: \(P(X=5)\), \(P(Z>0.34)\), \(P(W \geq 3.5 | S = 1)\)
Observations of the random variable, \(x, y, z, \dots\)
The sample space is the collection of all possible observation values.
The population is the collection of all possible observations.
A sample is a subset of the population.
A categorical random variable has nominal or ordinal outcomes such as; {red, blue, green} or {tiny, small, average, large, huge}.
A discrete random number has countable number of outcome values, such as {1,2,3,4,5,6}; {0,2,4,6,8} or all integers.
A discrete or categorical random variable can be described by its probability mass function (PMF).
The probability that the random variable, \(X\), takes the value \(x\) is denoted \(P(X=x) = p(x)\).
Possible outcomes: \(\{1, 2, 3, 4, 5, 6\}\)
The probability mass function;
x | 1 | 2 | 3 | 4 | 5 | 6 |
p(x) | 0.167 | 0.167 | 0.167 | 0.167 | 0.167 | 0.167 |
x | A | C | T | G |
p(x) | 0.4 | 0.2 | 0.1 | 0.3 |
The expected value is the average outcome of a random variable over many trials and is denoted \(E[X]\) or \(\mu\).
When the probability mass function is know, \(E(X)\) can be computed as follow;
\[E[X] = \mu = \sum_{i=1}^n x_i p(x_i),\] where \(n\) is the number of outcomes.
Alternatively, \(E(X)\) can be computed as the population mean, by summing over all \(N\) objects in the population;
\[E[X] = \mu = \frac{1}{N}\sum_{i=1}^N x_i\]
The variance is a measure of spread and is defined as the expected value of the squared distance from the population mean;
\[var(X) = \sigma^2 = E[(X-\mu)^2] = \sum_{i=1}^n (x_i-\mu)^2 p(x_i)\]
The standard deviation is the square root of the variance and is usually denoted \(\sigma\)
\[\sigma = \sqrt{E[(X-\mu)^2]} = \sqrt{\sum_{i=1}^n (x_i-\mu)^2 p(x_i)}\] or by summing over all objects in the population;
\[\sigma = \sqrt{\frac{1}{N} \sum_{i=1}^N (x_i-\mu)^2}\]
The standard deviation is always positive and on the same scale as the outcome values.
Once a random variables probability distribution is known, properties of interest can be computed, such as;
If the distribution is not known, simulation might be the solution.
When rolling a single dice the probabity of six is 1/6.
When rolling 20 dice, what is the probability of at least 15 sixes?
The outcome of a single dice roll is a random variable, \(X\), that can be described using an urn model.
Simulation in R!
In a uniform distribution every possible outcome has the same probability.
With \(n\) different outcomes, the probability for each outcome is \(1/n\).
A Bernoulli trial is a random experiment with two outcomes; success (1) and failure (0).
The outcome of a Bernoulli trial is a discrete random variable, \(X\).
\[P(X=x) = p(x) = \left\{ \begin{array}{ll} p & \mathrm{if}\,x=1\mathrm,\,success\\ 1-p & \mathrm{if}\,x=0\mathrm,\,failure \end{array} \right.\]
Using the definitions of expected value and variance it can be shown that;
\[E[X] = p\\ var(X) = p(1-p)\]
The number of successes in a series of \(n\) independent and identical Bernoulli trials (\(Z_i\), with probability \(p\) for success) is a discrete random variable, \(X\).
\[X = \sum_{i=0}^n Z_i,\]
The probability mass function of \(X\), called the binomial distribution, is
\[P(X=k) = \binom{n}{k} p^k (1-p)^{n-k}\]
The expected value and variance;
\[E[X] = np\\ var(X) = np(1-p)\]
In R: pbinom
to compute \(P(X \leq k)\) and dbinom
to compute the pmf \(P(X=k)\).
The hypergeometric distribution describe the number of successes in a series of \(n\) draws without replacement, from a population of size \(N\) with \(Np\) objects of interest (successes).
The probability density function
\[P(X=k) = \frac{\binom{Np}{k}\binom{N-Np}{n-k}}{\binom{N}{n}}\] In R: phyper
to compute \(P(X \leq k)\) and dhyper
to compute the pmf \(P(X=k)\).
The Poisson distribution describes the number of times a rare event (probability \(p\)) occurs in a large number (\(n\)) of trials.
The probability mass function;
\[P(X=k) = \frac{\lambda}{k!}e^{-\lambda},\]
\[E[X] = var(X) = \lambda = n \pi\]
The Poisson distribution can approximate the binomial distribution if \(n\) is large and \(\pi\) is small, \(n>10, \pi < 0.1\).
In R: ppois
to compute \(P(X \leq k)\) and dpois
to compute the pmf \(P(X=k)\).
A negative binomial distribution describes the number of failures that occur before a specified number of successes (\(r\)) has occurred, in a sequence of independent and identically distributed Bernoilli trials.
\(r\) is also called the dispersion parameter.
In R: dnbinm, pnbinom, qnbinom
The geometric distribution is a special case of the negative binomial distribution, where \(r=1\).
In R: dgeom, pgeom, qgeom
Probability mass functions, \(P(X=x)\); dbinom
, dhyper
, dpois
, dnbinom
and dgeom
.
Cumulative distribution functions, \(P(X \leq x)\); pbinom
, phyper
, ppois
, pnbinom
and pgeom
.
Also, functions for computing an \(x\) such that \(P(X \leq x) = q\), where \(q\) is a probability of interest are available using; qbinom
, qhyper
, qpois
, qnbinom
and qgeom
.