When performing exploratory analysis, bar charts and box plots are excellent methods that can effectively display the general data structure and distribution. Recently, I saw someone using raincloud plots to display data, and the graphics looked beautiful and interesting. Therefore, I have organized this information and implemented the drawing of raincloud plots using Python.
Introduction
A raincloud plot is actually a hybrid plot consisting of four parts: a violin plot (the cloud), a boxplot (the umbrella), and a swarm plot (the rain).
Data Preparation
We’ll continue to use the penguin dataset as an example, which is already available on this site for direct download: Penguin Data
Download Data
pip install ptitprince
wget [https://www.bobobk.com/wp-content/uploads/2021/12/penguins.csv](https://www.bobobk.com/wp-content/uploads/2021/12/penguins.csv)
load data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
df = pd.read_csv("penguins.csv")
df.head()
Result
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g sex
0 Adelie Torgersen 39.1 18.7 181.0 3750.0 MALE
1 Adelie Torgersen 39.5 17.4 186.0 3800.0 FEMALE
2 Adelie Torgersen 40.3 18.0 195.0 3250.0 FEMALE
3 Adelie Torgersen NaN NaN NaN NaN NaN
4 Adelie Torgersen 36.7 19.3 193.0 3450.0 FEMALE
Starting the Plotting
We’ve already installed the ptitprince package using pip. Now, let’s start plotting.
Violin Plot
First, let’s create half of the violin plot, which is the “cloud” part of the raincloud plot. The data is for the bill_length_mm variable, grouped by island. The half_violinplot function creates half a violin plot, and inner controls the small lines below. To position the cloud correctly, we place the variable on the Y-axis.
import matplotlib.pyplot as plt
import ptitprince as pt
pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
plt.savefig("half_violin.png",dpi=200)
Boxplot
Next is the “umbrella” part.
pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
sns.boxplot(data=df,y="island",x="bill_length_mm",width = .15, zorder = 10,boxprops = {'facecolor':'none', "zorder":10},
whiskerprops = {'linewidth':2, "zorder":10})
plt.savefig("violin_box.png",dpi=200)
Strip Plot
The “rain” part of the raincloud plot uses a stripplot, and the jitter parameter disperses the scattered points.
pt.half_violinplot(data=df,y="island",x="bill_length_mm",inner=None)
sns.boxplot(data=df,y="island",x="bill_length_mm",width = .15, zorder = 10,boxprops = {'facecolor':'none', "zorder":10},
whiskerprops = {'linewidth':2, "zorder":10})
sns.stripplot(data=df,y="island",x="bill_length_mm",jitter=1,edgecolor = "white",zorder = 0)
plt.savefig("raincloud.png",dpi=200)
R Implementation
Here, I’m also providing the R language implementation of the raincloud plot.
library(ggplot2)
library(ggdist)
df = read.table("penguins.csv",sep=",",header=TRUE)
pdf("raincloud.pdf",width=14, height=7)
ggplot(data=df,aes(y=bill_length_mm,x=factor(island),fill=factor(island)))+
ggdist::stat_halfeye(adjust=0.5,justification=-.2,.width=0,point_colour=NA) +
geom_boxplot(width=0.2,outlier.color=NA) +
ggdist::stat_dots(side="left",justification=1.1)
dev.off()
Summary
This article demonstrates how to create a raincloud plot by combining violin plots, box plots, and stripplots. This type of plot is more intuitive and visually appealing for displaying data distributions.