top of page

The Normal Distribution in Python

Writer's picture: Manogane SydwellManogane Sydwell

Updated: Jan 17, 2021

Data can take on a variety of forms. In some instances, it can be distributed more towards the left, and in other instances, more towards the right. However, sometimes data is distributed around a central value. When this last case is the case being dealt with, we say that the data has a normal distribution.


Before continuing with this article, it is important to define what we mean by data. By data, we mean statistical data, which is defined as a collection of numerical data. Now that we’re on the same page regarding what data is, lets actually play with data to get a better understanding of how it can be distributed. In order to play with data, we will make use of Quincunx!


A Quincunx or Galton Board is a triangular array of pegs. Balls are dropped onto the top most peg. Thereafter they make their way downwards and are collected in a few bins. When one first makes use of a Galton Board, the outcomes obtained may seem random. But depending on the settings given, after a hundred or so observations it will be apparent that the outcome will either be a normal distribution of data, or the data may be skewed either to the left or the right. The two videos below illustrate this.



Don’t let me have all the fun with the Quincunx; take your turn with it and have some fun!



A more formal definition of the normal distribution goes as follows:

The normal distribution is a continuous probability that is symmetrical on both sides of the mean, so the side on the right side of the mean is a mirror image of the side on the life side of the mean.




A lot of tests and methods with important use cases in statistics, and therefore in the world of finance, are based on the assumption that the data being worked with follows a normal distribution. In modern portfolio theory, the returns of a stock are presumed to follow a normal distribution.


The plot shown above will be reproduced in Python using the libraries Matplotlib,SciPy and NumPy. However plotting the normal distribution without context does not help our understanding of what it actually is. Therefore, this will plot will have to be given in context. Lets consider the following case.


According to Dr. Cook, height of male adults is normally distributed. Not just going based on what he says, this fact is generally well known as well. Now let us assume that the average male is 70 inches tall. The average deviation from this figure would be 4 inches. With this information we will be able to produce a plot that has a normal distribution. 


Firstly we will import the libraries required to produce the plot

#import the required libraries
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import norm

Thereafter we need to define the parameters that the plot will be based on. The mean is 70,the standard deviation is 1, the lower bound is 40 and the upper bound is 90.


# define constants
mu = 70
sigma = 1
x1 = 40
x2 = 90

Now we calculated the z-transform of the two bounds defined above. You can learn more about the z-transform here(insert link).

# calculate the z-transform
z1 = ( x1 - mu ) / sigma
z2 = ( x2 - mu ) / sigma

After the z-transforms are calculated, we calculated the relevant probabilities with SciPy.


x = np.arange(z1, z2, 0.001) # range of x in spec
x_all = np.arange(-10, 10, 0.001) # entire range of x, both in and out of spec
# mean = 0, stddev = 1, since Z-transform was calculated
y = norm.pdf(x,0,1)
y2 = norm.pdf(x_all,0,1)

Now we are able to build the plot. Matplotlib is used extensively in order to be able to achieve this.

fig, ax = plt.subplots(figsize=(9,6))
plt.style.use('fivethirtyeight')
ax.plot(x_all,y2)

ax.fill_between(x,y,0, alpha=0.3, color='b')
ax.fill_between(x_all,y2,0, alpha=0.1)
ax.set_xlim([-4,4])
ax.set_xlabel('# of Standard Deviations Outside the Mean')
ax.set_yticklabels([])
ax.set_title('Normal Gaussian Curve')

plt.savefig('normal_curve.png', dpi=72, bbox_inches='tight')
plt.show()

The code above produces the following output.


And there you have it. This article provides a brief overview of what the normal distribution is, and provides an example in order to understand it intuitively. This intuition is then applied to an example, which has hopefully provided readers of this article with an understanding of the normal distribution. A follow up to this article will involve using the normal distribution to simulate returns of a fictitious asset. Complement this article with the following video.



References

Harry Markowitz's Modern Portfolio Theory: The Efficient Frontier, guidedchoice.com

Bean Machine, Wikipedia.com

Random Variables, Quantopian.com

Normal Distribution, mathisfun.com

Quincunx Explain, mathisfun.com

Quincunx, mathisfun.com

Introduction to the Normal Distribution, simplypsychology.com

Desus and Mero, giphy.com

Distribution the adult heights, johndcook.com

Plotting a Gaussian normal curve with Python and Matplotlib,pythonforundergradengineers.com




26 views0 comments

Recent Posts

See All

Comments


©2020 by creativeAfricanProjects.

bottom of page