Journal: Statistical Models

What is Exponential Distribution?

Exponential distribution is usually used to model waiting times before a given event occurs. The waiting time is assumed to be a random variable having an exponential distribution. The probability that the event occurs during a time interval is proportional to the length of that time interval. In the real world, this condition is  very realistic and hence, exponential distribution has been used widely to model waiting times.

Let X be an absolutely continuous random variable which has an exponential distribution. The distribution function of the exponential random variable X is:

The expected value of an exponential random variable X is

The variance of an exponential random variable X is

The Memoryless Property

An important properties of exponential distribution is the memoryless property:

Roughly speaking, the probability that the event happens during a time interval of length y is independent of how much time has already elapsed (x) without the event happening.

Code Snippet: Exponential Distribution*

from scipy.stats import expon

x = np.linspace(0,4, 100)
colors=sns.color_palette()
lambda_ = [0.5, 1, 2, 4]
plt.figure(figsize=(12,4))
for l,c in zip(lambda_,colors):
plt.plot(x, expon.pdf(x, scale=1./l), lw=2,
color=c, label = "$\lambda = %.1f$"%l)
plt.fill_between(x, expon.pdf(x, scale=1./l), color=c, alpha = .33)
plt.legend()
plt.ylabel("PDF at $x$")
plt.xlabel("$x$")
plt.title("Probability density function of an Exponential random variable;\
differing $\lambda$");


What is Poisson Distribution?

Poisson distribution can be explained through the statistical experiment as followed:

• The experiment results in outcomes as successes or failures
• The average number of success that occurs in a specific region is known
• The probability that a success will occur is proportional to the size of the region
• The probability that a success will occur in an extreme small region is virtually zero

In Poisson experiment, it is possible to count how many events have occurred (successes), but meaningless to ask how many such events have not occurred (failures). The Poisson distribution is hence, the Binomial distribution with unknown probability of failure Q.  It is important to note that the Poisson situation is most often invoked for rare events within a given time interval.

Poisson Distribution

Assume the following:

• µ: the mean number of successes that occur in a specified region
• x: the actual number of successes that occur in a specified region

The Poisson probability that exactly x successes occur in a Poisson experiment is:

The mean of the probability distribution is equal to µ

The variance is also equal to to µ

Cumulative Distribution Function

The formula for the Poisson cumulative probability function is

Code Snippet: Poisson Distribution*

from scipy.stats import poisson

k = np.arange(15)
plt.figure(figsize=(12,8))
for i, lambda_ in enumerate([1, 2, 4, 6]):
plt.plot(k, poisson.pmf(k, lambda_), '-o', label=lambda_, color=colors[i])
plt.fill_between(k, poisson.pmf(k, lambda_), color=colors[i], alpha=0.5)
plt.legend()
plt.title("Poisson distribution")
plt.ylabel("PDF at $k$")
plt.xlabel("$k$");


What is Normal Distribution?

Normal distribution, in layman’s terms, is the “bell curve.”

Some characteristics of the normal distribution:

• Mean = median = mode
• Standard deviation and proportion of population:
• 68% of values are within 1 standard deviation of the mean
• 95% of values are within 2 standard deviations of the mean
• 7% of values are within 3 standard deviations of the man

Standard Normal Distribution

To convert normal distribution to standard normal distribution, we can standardize the values by computing the z-score as following:

Normal (Gaussian) Distribution

A continuous random variable Z is said to be a standard normal (Gaussian) random variable, show as Z ~ N(0,1), if its PDF is given by

If Z is a standard normal random variable and

then X is a normal random variable with mean  and variance as followed:

Hence, the PDF:

Where:

erf is the function sometimes called the error function.

From the above, the probability that a sample from a Gaussian distribution exceeds a threshold z can be found using the CDF:

The Central Limit Theorem

A rule of thumb, the CLT starts holding at N ~ 30.

The CLT is fundamental for statistical inference to conclude about any estimators and their distribution. Refer here to read more about estimator and sampling distribution.

Code Snippet: Normal Distribution*

colors=sns.color_palette()
norm =  sp.stats.norm
x = np.linspace(-5,5, num=200)

fig = plt.figure(figsize=(12,6))
for mu, sigma, c in zip([0.5]*3, [0.2, 0.5, 0.8], colors):
plt.plot(x, norm.pdf(x, mu, sigma), lw=2,
c=c, label = r"$\mu = {0:.1f}, \sigma={1:.1f}$".format(mu, sigma))
plt.fill_between(x, norm.pdf(x, mu, sigma), color=c, alpha = .4)

plt.xlim([-5,5])
plt.legend(loc=0)
plt.ylabel("PDF at $x$")
plt.xlabel("$x$")


What is Binomial Distribution?

Binomial distribution can be explained through the statistical experiment as followed:

• the experiment consists of n repeated trials
• each trial results in two possible outcomes: success and failure
• the probability of success outcome is P which is the same on every trial
• the trials are independent

Binomial Distribution

Assume the following:

• x: the number of successes that result from the binomial experiment
• n: the number of trials in the binomial experiment
• P: the probability of success on individual trial
• A binomial random variable is the number of successes x in n repeated trials of a binomial experiment.

The mean of the distribution of X is:

The variance of the distribution of X is:

The binomial probability for x successes is:

Binomial Distribution in the large n, large k limit

Consider the binomial distribution (n, k, p) in the limit of large n. Using CLT we can replace the binomial distribution at large n by a Gaussian where k is a continuous variable, and whose mean is the mean of the binomial np:

Bernoulli Distribution

Bernoulli Distribution is a special case of the binomial distribution where n = 1.

Code Snippet: Binomial Distribution*

from scipy.stats import binom

plt.figure(figsize=(12,6))
k = np.arange(0, 200)
for p, color in zip([0.1,0.3,0.5,0.7,0.9], colors):
rv = binom(200, p)
plt.plot(k, rv.pmf(k), '.', lw=2, color=color, label=p)
plt.fill_between(k, rv.pmf(k), color=color, alpha=0.5)
q=plt.legend()
plt.title("Binomial distribution")
plt.tight_layout()
q=plt.ylabel("PDF at $k$")
q=plt.xlabel("$k$")



*Import the following package for statistical models and visualisation
%matplotlib inline
import numpy as np
import scipy as sp
import matplotlib as mpl
import matplotlib.cm as cm
import matplotlib.pyplot as plt
import pandas as pd
import time
pd.set_option('display.width', 500)
pd.set_option('display.max_columns', 100)
pd.set_option('display.notebook_repr_html', True)
import seaborn as sns
sns.set_style("whitegrid")
sns.set_context("poster")