How to Plot a Normal Distribution with Matplotlib in Python
How to plot a normal distribution with Matplotlib in Python is an essential skill for data visualization and statistical analysis. This comprehensive guide will walk you through various methods and techniques to create normal distribution plots using Matplotlib, one of the most popular plotting libraries in Python. We’ll cover everything from basic concepts to advanced customization options, providing you with the knowledge and tools to effectively visualize normal distributions in your data science projects.
Understanding Normal Distribution and Its Importance
Before diving into the plotting techniques, it’s crucial to understand what a normal distribution is and why it’s important in data analysis. A normal distribution, also known as a Gaussian distribution, is a symmetric probability distribution that follows a bell-shaped curve. It’s characterized by its mean (μ) and standard deviation (σ), which determine the center and spread of the distribution, respectively.
Normal distributions are ubiquitous in nature and play a significant role in various fields, including statistics, physics, and social sciences. They’re often used to model real-world phenomena and are the foundation for many statistical tests and analyses.
Setting Up Your Python Environment
To plot a normal distribution with Matplotlib in Python, you’ll need to have the following libraries installed:
- Matplotlib
- NumPy
- SciPy (optional, but useful for generating normal distribution data)
You can install these libraries using pip:
pip install matplotlib numpy scipy
Once you have the necessary libraries installed, you’re ready to start plotting normal distributions with Matplotlib.
Basic Normal Distribution Plot
Let’s begin with a simple example of how to plot a normal distribution with Matplotlib in Python. We’ll use NumPy to generate the data and Matplotlib to create the plot.
import numpy as np
import matplotlib.pyplot as plt
# Generate data for the normal distribution
mu, sigma = 0, 1 # mean and standard deviation
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
y = 1/(sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))
# Create the plot
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Normal Distribution')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python')
plt.xlabel('X-axis')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True)
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
In this example, we first generate the data for a standard normal distribution (mean = 0, standard deviation = 1) using NumPy’s linspace
function to create evenly spaced x-values and the probability density function formula to calculate the corresponding y-values. Then, we use Matplotlib’s plot
function to create the line plot, add labels, a title, and a legend, and finally display the plot using plt.show()
.
Histogram with Normal Distribution Overlay
Another common way to visualize a normal distribution is by creating a histogram of the data and overlaying a normal distribution curve. This method is particularly useful when working with real-world data that approximates a normal distribution.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate random data
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=1000)
# Create the histogram
plt.figure(figsize=(10, 6))
plt.hist(data, bins=30, density=True, alpha=0.7, color='skyblue', edgecolor='black')
# Generate points for the normal distribution curve
x = np.linspace(data.min(), data.max(), 100)
y = stats.norm.pdf(x, loc=data.mean(), scale=data.std())
# Plot the normal distribution curve
plt.plot(x, y, 'r-', lw=2, label='Normal Distribution')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Histogram with Overlay')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
In this example, we generate random data from a normal distribution using NumPy’s random.normal
function. We then create a histogram of the data using Matplotlib’s hist
function with the density=True
parameter to normalize the histogram. Finally, we overlay a normal distribution curve using SciPy’s stats.norm.pdf
function to generate the curve points and Matplotlib’s plot
function to draw the line.
Customizing Normal Distribution Plots
Matplotlib offers a wide range of customization options to enhance your normal distribution plots. Let’s explore some of these options:
Changing Colors and Styles
You can easily modify the colors and styles of your plots to make them more visually appealing or to match your project’s theme.
import numpy as np
import matplotlib.pyplot as plt
mu, sigma = 0, 1
x = np.linspace(mu - 3*sigma, mu + 3*sigma, 100)
y = 1/(sigma * np.sqrt(2 * np.pi)) * np.exp(-(x - mu)**2 / (2 * sigma**2))
plt.figure(figsize=(10, 6))
plt.plot(x, y, color='purple', linestyle='--', linewidth=2, label='Normal Distribution')
plt.fill_between(x, y, alpha=0.3, color='lavender')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Custom Colors')
plt.xlabel('X-axis')
plt.ylabel('Probability Density')
plt.legend()
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
In this example, we’ve changed the line color to purple, used a dashed line style, increased the line width, and added a light fill color under the curve using plt.fill_between()
.
Adding Multiple Normal Distributions
You can plot multiple normal distributions on the same graph to compare different parameters or datasets.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-5, 5, 100)
y1 = 1/(1 * np.sqrt(2 * np.pi)) * np.exp(-(x - 0)**2 / (2 * 1**2))
y2 = 1/(1.5 * np.sqrt(2 * np.pi)) * np.exp(-(x - 1)**2 / (2 * 1.5**2))
plt.figure(figsize=(10, 6))
plt.plot(x, y1, label='μ=0, σ=1')
plt.plot(x, y2, label='μ=1, σ=1.5')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Multiple Distributions')
plt.xlabel('X-axis')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True, linestyle=':', alpha=0.7)
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example demonstrates how to plot two normal distributions with different means and standard deviations on the same graph, allowing for easy comparison.
Using Subplots
Subplots are useful when you want to display multiple related plots side by side or in a grid layout.
import numpy as np
import matplotlib.pyplot as plt
x = np.linspace(-5, 5, 100)
y1 = 1/(1 * np.sqrt(2 * np.pi)) * np.exp(-(x - 0)**2 / (2 * 1**2))
y2 = 1/(2 * np.sqrt(2 * np.pi)) * np.exp(-(x - 0)**2 / (2 * 2**2))
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 5))
ax1.plot(x, y1, label='σ=1')
ax1.set_title('Normal Distribution (μ=0, σ=1)')
ax1.set_xlabel('X-axis')
ax1.set_ylabel('Probability Density')
ax1.legend()
ax1.text(0, 0.1, 'how2matplotlib.com', fontsize=10, alpha=0.7)
ax2.plot(x, y2, label='σ=2')
ax2.set_title('Normal Distribution (μ=0, σ=2)')
ax2.set_xlabel('X-axis')
ax2.set_ylabel('Probability Density')
ax2.legend()
ax2.text(0, 0.05, 'how2matplotlib.com', fontsize=10, alpha=0.7)
plt.suptitle('How to Plot a Normal Distribution with Matplotlib in Python: Subplots')
plt.tight_layout()
plt.show()
Output:
This example creates two subplots side by side, each showing a normal distribution with different standard deviations.
Advanced Techniques for Normal Distribution Plots
Now that we’ve covered the basics, let’s explore some advanced techniques for plotting normal distributions with Matplotlib in Python.
3D Normal Distribution Plot
You can create a 3D surface plot of a bivariate normal distribution to visualize the relationship between two variables.
import numpy as np
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
def bivariate_normal(X, Y, sigmax=1.0, sigmay=1.0, mux=0.0, muy=0.0, sigmaxy=0.0):
Xmu = X-mux
Ymu = Y-muy
rho = sigmaxy/(sigmax*sigmay)
z = Xmu**2/sigmax**2 + Ymu**2/sigmay**2 - 2*rho*Xmu*Ymu/(sigmax*sigmay)
denom = 2*np.pi*sigmax*sigmay*np.sqrt(1-rho**2)
return np.exp(-z/(2*(1-rho**2))) / denom
x = np.linspace(-3, 3, 100)
y = np.linspace(-3, 3, 100)
X, Y = np.meshgrid(x, y)
Z = bivariate_normal(X, Y, sigmax=1, sigmay=1, mux=0, muy=0, sigmaxy=0)
fig = plt.figure(figsize=(10, 8))
ax = fig.add_subplot(111, projection='3d')
surf = ax.plot_surface(X, Y, Z, cmap='viridis')
ax.set_xlabel('X')
ax.set_ylabel('Y')
ax.set_zlabel('Probability Density')
ax.set_title('How to Plot a Normal Distribution with Matplotlib in Python: 3D Bivariate')
fig.colorbar(surf, shrink=0.5, aspect=5)
ax.text(0, 0, 0, 'how2matplotlib.com', fontsize=10, alpha=0.7)
plt.show()
Output:
This example creates a 3D surface plot of a bivariate normal distribution using Matplotlib’s 3D plotting capabilities.
Cumulative Distribution Function (CDF) Plot
The cumulative distribution function (CDF) is another important visualization of a normal distribution.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x = np.linspace(-4, 4, 100)
y = stats.norm.cdf(x)
plt.figure(figsize=(10, 6))
plt.plot(x, y, label='CDF')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: CDF')
plt.xlabel('X-axis')
plt.ylabel('Cumulative Probability')
plt.legend()
plt.grid(True, linestyle=':', alpha=0.7)
plt.text(0, 0.5, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example plots the cumulative distribution function of a standard normal distribution using SciPy’s stats.norm.cdf
function.
Q-Q Plot
A Q-Q (Quantile-Quantile) plot is used to assess whether a dataset follows a normal distribution.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=1000)
# Create Q-Q plot
fig, ax = plt.subplots(figsize=(10, 6))
stats.probplot(data, dist="norm", plot=ax)
ax.set_title("How to Plot a Normal Distribution with Matplotlib in Python: Q-Q Plot")
ax.text(0, 0, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example creates a Q-Q plot using SciPy’s stats.probplot
function, which compares the quantiles of the sample data to the quantiles of a theoretical normal distribution.
Visualizing Normal Distribution Properties
Understanding and visualizing the properties of a normal distribution is crucial for data analysis. Let’s explore some ways to visualize these properties using Matplotlib.
Standard Deviations and Percentiles
You can visualize the standard deviations and percentiles of a normal distribution to better understand its spread and central tendency.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
x = np.linspace(-4, 4, 1000)
y = stats.norm.pdf(x)
plt.figure(figsize=(12, 6))
plt.plot(x, y, 'b-', label='Normal Distribution')
plt.fill_between(x, y, where=(x >= -1) & (x <= 1), color='red', alpha=0.3, label='68% (1σ)')
plt.fill_between(x, y, where=(x >= -2) & (x <= 2), color='green', alpha=0.2, label='95% (2σ)')
plt.fill_between(x, y, where=(x >= -3) & (x <= 3), color='blue', alpha=0.1, label='99.7% (3σ)')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Standard Deviations')
plt.xlabel('X-axis (Standard Deviations)')
plt.ylabel('Probability Density')
plt.legend()
plt.grid(True, linestyle=':', alpha=0.7)
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example visualizes the 68-95-99.7 rule (also known as the empirical rule) by shading the areas within one, two, and three standard deviations of the mean.
Comparing Empirical Data to Normal Distribution
When working with real-world data, it’s often useful to compare your empirical data to a theoretical normal distribution. Here are some techniques to do this using Matplotlib.
Overlay Empirical Data on Theoretical Normal Distribution
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=1000)
# Calculate mean and standard deviation of the data
mu, std = np.mean(data), np.std(data)
# Create the plot
plt.figure(figsize=(10, 6))
# Plot histogram of empirical data
plt.hist(data, bins=30, density=True, alpha=0.7, color='skyblue', edgecolor='black', label='Empirical Data')
# Plot theoretical normal distribution
x = np.linspace(mu - 3*std, mu + 3*std, 100)
y = stats.norm.pdf(x, mu, std)
plt.plot(x, y, 'r-', lw=2, label='Theoretical Normal')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Empirical vs Theoretical')
plt.xlabel('Value')
plt.ylabel('Density')
plt.legend()
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example generates sample data, creates a histogram of the empirical data, and overlays a theoretical normal distribution with the same mean and standard deviation as the sample data.
Probability Plot (P-P Plot)
A probability plot, or P-P plot, is another useful tool for comparing empirical data to a theoretical normal distribution.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample data
np.random.seed(42)
data = np.random.normal(loc=0, scale=1, size=1000)
# Create P-P plot
fig, ax = plt.subplots(figsize=(10, 6))
stats.probplot(data, dist="norm", plot=ax)
ax.get_lines()[0].set_markerfacecolor('skyblue')
ax.get_lines()[0].set_markeredgecolor('blue')
ax.set_title("How to Plot a Normal Distribution with Matplotlib in Python: P-P Plot")
ax.set_xlabel("Theoretical Quantiles")
ax.set_ylabel("Sample Quantiles")
ax.text(0, 0, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example creates a P-P plot, which compares the cumulative distribution of the sample data to that of a theoretical normal distribution.
Advanced Customization Techniques
Matplotlib offers a wide range of advanced customization options to create publication-quality plots. Let’s explore some of these techniques.
Custom Styling with Seaborn
Seaborn is a statistical data visualization library built on top of Matplotlib that provides a high-level interface for drawing attractive statistical graphics. You can use Seaborn to easily apply custom styles to your normal distribution plots.
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
# Set Seaborn style
sns.set_style("whitegrid")
sns.set_palette("deep")
# Generate data
x = np.linspace(-4, 4, 100)
y = 1/(1 * np.sqrt(2 * np.pi)) * np.exp(-(x - 0)**2 / (2 * 1**2))
# Create the plot
plt.figure(figsize=(10, 6))
sns.lineplot(x=x, y=y, label='Normal Distribution')
sns.despine()
plt.title('How to Plot a Normal Distribution with Matplotlib and Seaborn in Python')
plt.xlabel('X-axis')
plt.ylabel('Probability Density')
plt.legend()
plt.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example uses Seaborn to apply a custom style to the normal distribution plot, resulting in a more visually appealing graph.
Animation of Normal Distribution
You can create animated plots to visualize how changes in parameters affect the normal distribution.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
fig, ax = plt.subplots(figsize=(10, 6))
line, = ax.plot([], [], lw=2)
ax.set_xlim(-5, 5)
ax.set_ylim(0, 0.5)
ax.set_title('How to Plot a Normal Distribution with Matplotlib in Python: Animation')
ax.set_xlabel('X-axis')
ax.set_ylabel('Probability Density')
ax.text(0, 0.1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
def init():
line.set_data([], [])
return line,
def animate(i):
x = np.linspace(-5, 5, 100)
y = 1/(i * np.sqrt(2 * np.pi)) * np.exp(-(x - 0)**2 / (2 * i**2))
line.set_data(x, y)
ax.set_title(f'Normal Distribution (σ = {i:.2f})')
return line,
anim = FuncAnimation(fig, animate, init_func=init, frames=np.linspace(0.5, 2, 100), interval=50, blit=True)
plt.show()
Output:
This example creates an animation that shows how the normal distribution changes as the standard deviation increases.
Practical Applications of Normal Distribution Plots
Understanding how to plot a normal distribution with Matplotlib in Python is crucial for various real-world applications. Let’s explore some practical examples.
Quality Control in Manufacturing
In manufacturing, normal distribution plots are often used to analyze product specifications and quality control measures.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample data for product measurements
np.random.seed(42)
measurements = np.random.normal(loc=100, scale=2, size=1000)
# Calculate mean and standard deviation
mean = np.mean(measurements)
std = np.std(measurements)
# Create the plot
plt.figure(figsize=(12, 6))
# Plot histogram of measurements
plt.hist(measurements, bins=30, density=True, alpha=0.7, color='skyblue', edgecolor='black', label='Product Measurements')
# Plot theoretical normal distribution
x = np.linspace(mean - 4*std, mean + 4*std, 100)
y = stats.norm.pdf(x, mean, std)
plt.plot(x, y, 'r-', lw=2, label='Theoretical Normal')
# Add specification limits
lower_limit, upper_limit = 95, 105
plt.axvline(lower_limit, color='g', linestyle='--', label='Specification Limits')
plt.axvline(upper_limit, color='g', linestyle='--')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Quality Control')
plt.xlabel('Measurement')
plt.ylabel('Density')
plt.legend()
plt.text(100, 0.05, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example simulates product measurements and plots them against a theoretical normal distribution, along with specification limits, to visualize quality control in a manufacturing process.
Financial Risk Analysis
Normal distribution plots are widely used in finance for risk analysis and portfolio management.
import numpy as np
import matplotlib.pyplot as plt
from scipy import stats
# Generate sample returns data
np.random.seed(42)
returns = np.random.normal(loc=0.05, scale=0.1, size=1000)
# Calculate Value at Risk (VaR) at 95% confidence level
var_95 = np.percentile(returns, 5)
# Create the plot
plt.figure(figsize=(12, 6))
# Plot histogram of returns
plt.hist(returns, bins=30, density=True, alpha=0.7, color='skyblue', edgecolor='black', label='Returns Distribution')
# Plot theoretical normal distribution
x = np.linspace(min(returns), max(returns), 100)
y = stats.norm.pdf(x, np.mean(returns), np.std(returns))
plt.plot(x, y, 'r-', lw=2, label='Theoretical Normal')
# Add VaR line
plt.axvline(var_95, color='g', linestyle='--', label=f'95% VaR: {var_95:.2f}')
plt.title('How to Plot a Normal Distribution with Matplotlib in Python: Financial Risk Analysis')
plt.xlabel('Returns')
plt.ylabel('Density')
plt.legend()
plt.text(0, 1, 'how2matplotlib.com', fontsize=12, alpha=0.7)
plt.show()
Output:
This example simulates financial returns and visualizes their distribution along with the Value at Risk (VaR) at a 95% confidence level.
Conclusion
In this comprehensive guide, we’ve explored how to plot a normal distribution with Matplotlib in Python, covering a wide range of techniques and applications. From basic plots to advanced customization and real-world examples, you now have the tools and knowledge to effectively visualize normal distributions in your data science projects.
Remember that while the normal distribution is a powerful and widely used model, it’s essential to always check your data’s actual distribution and not assume normality. The visualization techniques we’ve covered can help you assess whether your data follows a normal distribution and make informed decisions about your statistical analyses.