Matplotlib boxplot multiple columns
Matplotlib is a popular Python library used for data visualization. One of the popular plotting functions provided by matplotlib is the boxplot. A boxplot is a graphical representation of statistical data that displays the data distribution using quartiles.
In this article, we will explore how to create boxplots for multiple columns in matplotlib. We will start by understanding the basics of boxplots and then dive into creating boxplots for multiple columns using different datasets.
Basics of Matplotlib Boxplots
A boxplot is a useful tool to visualize the distribution of a dataset through its quartiles. The box in the plot represents the interquartile range (IQR), while whiskers represent the minimum and maximum values within 1.5 times the IQR. Any data points outside this range are considered outliers and are typically represented as individual points.
A boxplot can be used to compare multiple datasets at a glance, making it an effective visualization tool to analyze data across different categories.
Creating Boxplots for Multiple Columns
To create a boxplot for multiple columns in matplotlib, we need to have a dataset with multiple columns. Here, we will demonstrate this using a sample dataset called “iris(iris.csv)” which contains measurements of four features, namely sepal length, sepal width, petal length, and petal width, for different iris flowers.
Let’s start by loading the necessary libraries and the dataset:
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
iris = pd.read_csv("iris.csv")
Now, we can use the boxplot()
function provided by matplotlib to create a boxplot for multiple columns.
# Create boxplot using the boxplot() function
iris.boxplot(column=["sepal.length", "sepal.width", "petal.length", "petal.width"])
# Set the title and labels for the boxplot
plt.title("Boxplot of Iris Features")
plt.xlabel("Features")
plt.ylabel("Values")
# Show the plot
plt.show()
This code will generate a boxplot with four boxes representing the four features in the iris dataset.
Now, let’s provide 10 code examples with the execution results to further understand how to create boxplots for multiple columns.
Code Examples:
Example 1: Creating a Boxplot for Two Columns
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot
df.boxplot(column=["column1", "column2"])
# Set the title and labels
plt.title("Boxplot of Two Columns")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 2: Boxplot with Notched Boxes
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with notched boxes
df.boxplot(column=["column1", "column2"], notch=True)
# Set the title and labels
plt.title("Boxplot with Notched Boxes")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 3: Custom Box Colors
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Set the custom box colors
box_colors = ["#FFC0CB", "#ADD8E6"]
# Create the boxplot with custom colors
fig, ax = plt.subplots()
bp = ax.boxplot([df["column1"], df["column2"]], patch_artist=True)
# Set the title and labels
plt.title("Boxplot with Custom Box Colors")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 4: Horizontal Boxplot
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the horizontal boxplot
df.boxplot(column=["column1", "column2"], vert=False)
# Set the title and labels
plt.title("Horizontal Boxplot")
plt.xlabel("Values")
plt.ylabel("Columns")
# Display the plot
plt.show()
Output:
Example 5: Changing Whisker Length
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with changed whisker length
df.boxplot(column=["column1", "column2"], whiskerprops=dict(linestyle='--', linewidth=2))
# Set the title and labels
plt.title("Boxplot with Changed Whisker Length")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 6: Show Outlier Points
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 20], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with outlier points
df.boxplot(column=["column1", "column2"], showfliers=True, flierprops=dict(marker="o", markerfacecolor="red", markersize=8))
# Set the title and labels
plt.title("Boxplot with Outlier Points")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 7: Grouped Boxplot
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the grouped boxplot
df["group"] = ["A", "A", "B", "B", "B"]
df.boxplot(column=["column1", "column2"], by="group")
# Set the title and labels
plt.title("Grouped Boxplot")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 8: Rotated X-Axis Labels
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with rotated x-axis labels
df.boxplot(column=["column1", "column2"])
plt.xticks(rotation=45)
# Set the title and labels
plt.title("Boxplot with Rotated X-Axis Labels")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 9: Changing Box Widths
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with changed box widths
df.boxplot(column=["column1", "column2"], widths=[0.2, 0.4])
# Set the title and labels
plt.title("Boxplot with Changed Box Widths")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
Example 10: Adding Gridlines
import matplotlib.pyplot as plt
import pandas as pd
# Load the dataset
data = {"column1": [1, 2, 3, 4, 5], "column2": [2, 4, 6, 8, 10]}
df = pd.DataFrame(data)
# Create the boxplot with gridlines
df.boxplot(column=["column1", "column2"], grid=True)
# Set the title and labels
plt.title("Boxplot with Gridlines")
plt.xlabel("Columns")
plt.ylabel("Values")
# Display the plot
plt.show()
Output:
These code examples demonstrate different variations of creating boxplots for multiple columns using matplotlib. By customizing various parameters, such as colors, box widths, whisker lengths, and more, we can create visually appealing and informative boxplots.
Matplotlib boxplot multiple columns Conclusion
Boxplots are a useful tool for visualizing the distribution of data across multiple columns. With matplotlib, creating boxplots for multiple columns is straightforward, and by adjusting various parameters, we can customize the appearance of the plots to suit our needs. Boxplots provide valuable insights into the dataset’s distribution and can be used to compare different columns or datasets, making them an essential part of data analysis and visualization.
By exploring the code examples provided in this article, and experimenting with different datasets and parameters, you can gain a deeper understanding of how to create boxplots for multiple columns using matplotlib.