Matplotlib boxplot multiple columns

Matplotlib boxplot multiple columns

Matplotlib is a popular Python library used for data visualization. One of the popular plotting functions provided by matplotlib is the boxplot. A boxplot is a graphical representation of statistical data that displays the data distribution using quartiles.

In this article, we will explore how to create boxplots for multiple columns in matplotlib. We will start by understanding the basics of boxplots and then dive into creating boxplots for multiple columns using different datasets.

Basics of Matplotlib Boxplots

A boxplot is a useful tool to visualize the distribution of a dataset through its quartiles. The box in the plot represents the interquartile range (IQR), while whiskers represent the minimum and maximum values within 1.5 times the IQR. Any data points outside this range are considered outliers and are typically represented as individual points.

A boxplot can be used to compare multiple datasets at a glance, making it an effective visualization tool to analyze data across different categories.

Creating Boxplots for Multiple Columns

To create a boxplot for multiple columns in matplotlib, we need to have a dataset with multiple columns. Here, we will demonstrate this using a sample dataset called “iris(iris.csv)” which contains measurements of four features, namely sepal length, sepal width, petal length, and petal width, for different iris flowers.

Let’s start by loading the necessary libraries and the dataset:

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
iris = pd.read_csv("iris.csv")

Now, we can use the boxplot() function provided by matplotlib to create a boxplot for multiple columns.

# Create boxplot using the boxplot() function
iris.boxplot(column=["sepal.length", "sepal.width", "petal.length", "petal.width"])

# Set the title and labels for the boxplot
plt.title("Boxplot of Iris Features")
plt.xlabel("Features")
plt.ylabel("Values")

# Show the plot
plt.show()

This code will generate a boxplot with four boxes representing the four features in the iris dataset.

Now, let’s provide 10 code examples with the execution results to further understand how to create boxplots for multiple columns.

Code Examples:

Example 1: Creating a Boxplot for Two Columns

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot
df.boxplot(column=["column1", "column2"])

# Set the title and labels
plt.title("Boxplot of Two Columns")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 2: Boxplot with Notched Boxes

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with notched boxes
df.boxplot(column=["column1", "column2"], notch=True)

# Set the title and labels
plt.title("Boxplot with Notched Boxes")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 3: Custom Box Colors

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Set the custom box colors
box_colors = ["#FFC0CB", "#ADD8E6"]

# Create the boxplot with custom colors
fig, ax = plt.subplots()
bp = ax.boxplot([df["column1"], df["column2"]], patch_artist=True)

# Set the title and labels
plt.title("Boxplot with Custom Box Colors")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 4: Horizontal Boxplot

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the horizontal boxplot
df.boxplot(column=["column1", "column2"], vert=False)

# Set the title and labels
plt.title("Horizontal Boxplot")
plt.xlabel("Values")
plt.ylabel("Columns")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 5: Changing Whisker Length

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with changed whisker length
df.boxplot(column=["column1", "column2"], whiskerprops=dict(linestyle='--', linewidth=2))

# Set the title and labels
plt.title("Boxplot with Changed Whisker Length")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 6: Show Outlier Points

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 20], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with outlier points
df.boxplot(column=["column1", "column2"], showfliers=True, flierprops=dict(marker="o", markerfacecolor="red", markersize=8))

# Set the title and labels
plt.title("Boxplot with Outlier Points")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 7: Grouped Boxplot

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the grouped boxplot
df["group"] = ["A", "A", "B", "B", "B"]
df.boxplot(column=["column1", "column2"], by="group")

# Set the title and labels
plt.title("Grouped Boxplot")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 8: Rotated X-Axis Labels

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with rotated x-axis labels
df.boxplot(column=["column1", "column2"])
plt.xticks(rotation=45)

# Set the title and labels
plt.title("Boxplot with Rotated X-Axis Labels")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 9: Changing Box Widths

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with changed box widths
df.boxplot(column=["column1", "column2"], widths=[0.2, 0.4])

# Set the title and labels
plt.title("Boxplot with Changed Box Widths")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

Example 10: Adding Gridlines

import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)

# Create the boxplot with gridlines
df.boxplot(column=["column1", "column2"], grid=True)

# Set the title and labels
plt.title("Boxplot with Gridlines")
plt.xlabel("Columns")
plt.ylabel("Values")

# Display the plot
plt.show()

Output:
Matplotlib boxplot multiple columns

These code examples demonstrate different variations of creating boxplots for multiple columns using matplotlib. By customizing various parameters, such as colors, box widths, whisker lengths, and more, we can create visually appealing and informative boxplots.

Matplotlib boxplot multiple columns Conclusion

Boxplots are a useful tool for visualizing the distribution of data across multiple columns. With matplotlib, creating boxplots for multiple columns is straightforward, and by adjusting various parameters, we can customize the appearance of the plots to suit our needs. Boxplots provide valuable insights into the dataset’s distribution and can be used to compare different columns or datasets, making them an essential part of data analysis and visualization.

By exploring the code examples provided in this article, and experimenting with different datasets and parameters, you can gain a deeper understanding of how to create boxplots for multiple columns using matplotlib.

Like(1)