Matplotlib Box Plots

Matplotlib Box Plots

Box plots, also known as box-and-whisker plots, are a graphical representation of statistical data based on the minimum, first quartile, median, third quartile, and maximum. They are useful for highlighting the central tendency, dispersion, and skewness of the data, as well as identifying outliers. In this article, we will explore how to create box plots using Matplotlib, a comprehensive library for creating static, animated, and interactive visualizations in Python.

Introduction to Box Plots

A box plot displays the five-number summary of a set of data: the minimum, first quartile (Q1), median (Q2), third quartile (Q3), and maximum. These components are crucial for understanding the distribution of data. The box represents the interquartile range (IQR), which is the distance between the first and third quartiles. The line inside the box shows the median of the data. Whiskers extend from the box to show the range of the data, and points outside of the whiskers are considered outliers.

Creating a Basic Box Plot

Let’s start with a basic example of a box plot. This example will show you how to create a simple box plot using Matplotlib.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.normal(loc=0, scale=1, size=100)
plt.boxplot(data)
plt.title("Basic Box Plot - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Customizing Box Plots

Matplotlib allows for extensive customization of box plots. You can change the properties of the boxes, whiskers, caps, medians, and fliers (outliers).

Changing Box Properties

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 2) * 100
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor="cyan", color="blue"))
plt.title("Custom Box Properties - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Modifying Whisker Properties

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 3) * 100
plt.boxplot(data, whiskerprops=dict(color="green", linewidth=2))
plt.title("Custom Whisker Properties - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Adjusting Median Properties

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 4) * 100
plt.boxplot(data, medianprops=dict(color="red", linewidth=3))
plt.title("Custom Median Properties - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Customizing Flier (Outlier) Properties

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 5) * 100
plt.boxplot(data, flierprops=dict(marker='o', color='yellow', markersize=12))
plt.title("Custom Flier Properties - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Horizontal Box Plots

Box plots can be oriented horizontally by setting the vert parameter to False.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 2) * 100
plt.boxplot(data, vert=False)
plt.title("Horizontal Box Plot - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Multiple Box Plots

You can display multiple box plots side-by-side to compare different datasets.

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(1, 1.5, 100)
data3 = np.random.normal(2, 2, 100)
data = [data1, data2, data3]

plt.boxplot(data)
plt.title("Multiple Box Plots - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Box Plots with Custom Fill Colors

You can customize the fill colors of box plots to enhance their visual appeal.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 3) * 100
plt.boxplot(data, patch_artist=True, boxprops=dict(facecolor="lightgreen"))
plt.title("Box Plots with Custom Fill Colors - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Box Plots with Notches

Adding notches to a box plot can provide a visual indication of the confidence interval around the median.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 2) * 100
plt.boxplot(data, notch=True)
plt.title("Box Plots with Notches - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Box Plots with Custom Outlier Symbols

You can customize the appearance of outliers using the flierprops parameter.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 4) * 100
plt.boxplot(data, flierprops=dict(marker='x', color='purple', markersize=8))
plt.title("Box Plots with Custom Outlier Symbols - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Box Plots Without Outliers

It’s possible to create box plots that do not display outliers by setting the showfliers parameter to False.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 3) * 100
plt.boxplot(data, showfliers=False)
plt.title("Box Plots Without Outliers - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Grouped Box Plots

Grouped box plots can be created to compare distributions across different categories.

import matplotlib.pyplot as plt
import numpy as np

data1 = np.random.normal(0, 1, 100)
data2 = np.random.normal(1, 1.5, 100)
data3 = np.random.normal(2, 2, 100)
data = [data1, data2, data3]

positions = [1, 2, 4]
plt.boxplot(data, positions=positions)
plt.title("Grouped Box Plots - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Box Plots with Custom Whisker Length

The length of the whiskers can be customized by setting the whis parameter.

import matplotlib.pyplot as plt
import numpy as np

data = np.random.rand(10, 2) * 100
plt.boxplot(data, whis=0.75)
plt.title("Box Plots with Custom Whisker Length - how2matplotlib.com")
plt.show()

Output:

Matplotlib Box Plots

Conclusion

Box plots are a powerful tool for statistical analysis, providing a compact representation of data distributions. With Matplotlib, you can create, customize, and compare box plots with ease. By adjusting properties such as color, width, orientation, and outlier symbols, you can tailor your plots to your specific needs, making your data analysis both effective and visually appealing. Whether you’re exploring a single dataset or comparing multiple groups, box plots can provide valuable insights into your data’s structure and outliers.

Like(0)